跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.88) 您好!臺灣時間:2026/02/14 13:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:馬俊力
研究生(外文):Alim Misbullah
論文名稱:深層長短期記憶網絡用於語音辨識之研究
論文名稱(外文):Deep Long Short-Term Memory Networks for Speech Recognition
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Chien, Jen-Tzung
口試委員:李琳山陳信宏王新民曹昱簡仁宗
口試委員(外文):Lee, Lin-ShanChen, Sin-HorngWang, Hsin-MinTsao, YuChien, Jen-Tzung
口試日期:2015-08-25
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機資訊國際學程
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:104
語文別:英文
論文頁數:114
中文關鍵詞:聲學模型語音辨識前饋式類神經網路長短期記憶模型
外文關鍵詞:acoustic modelingspeech recognitionfeedforward neural networklong short-term memory
相關次數:
  • 被引用被引用:4
  • 點閱點閱:968
  • 評分評分:
  • 下載下載:109
  • 收藏至我的研究室書目清單書目收藏:0
基於深度學習技術的語音辨識系統已證實能顯著提升語音辨識正確率。利用前饋式類神經網路(Feedforward Neural Network, FNN)或遞迴式類神經網路(Recurrent Neural Network, RNN)是近年來實現深層學習(Deep Learning)並建立聲學模型(Acoustic Model)常見的方法。前饋式類神經網路是透過多層非線性轉換提取深層抽象(Abstraction)且具有不變性(Invariance)的特徵,然而遞迴式類神經網路則利用遞迴的方式獲得時間序列(Temporal)資料的潛在資訊。長短期記憶(Long-Short Term Memory, LSTM)模型可以有效儲存歷史資訊,並被證實比傳統遞迴式類神經網路能更為有效的處理時間序列資料中間隔較長的重要資訊。本篇論文結合前饋式與遞迴式類神經網路的優點,提出具新穎性之深層長短期記憶類神經網路架構,實現包含FNN-LSTM、LSTM-FNN、LSTM-FNN-FNN以及LSTM-FNN-LSTM等不同的串聯模組,並根據這些串聯模組堆疊出更深層的類神經網路架構。在實驗評估中,我們使用卡爾迪 (Kaldi) 語音辨識軟體實現本論文所提出之深層架構。在第三屆CHiME Challenge及 Aurora-4語音資料庫的實驗結果顯示混合前饋式類神經網路及長短期記憶模型的深層架構可以有效提昇在雜訊環境下的語音辨識率。
Speech recognition has been significantly improved by applying acoustic models based on the deep neural network (DNN) which could be realized as the feedforward neural network (FNN) or the recurrent neural network (RNN). FNN is feasible to project the observations onto a deep invariant feature space while RNN is beneficial to capture the temporal information in sequence data. RNN based on the long short-term memory (LSTM) is capable of memorizing the inputs over a long time period and thus exploiting a self-learnt amount of long-range temporal context. By considering the complimentary FNN and RNN in their modeling capabilities, we present a new architecture of DNN model which is constructed by cascading LSTM and FNN in different ways and stacking the cascades of (1) FNN-LSTM, (2) LSTM-FNN, (3) LSTM-FNN-FNN and (4) LSTM-FNN-LSTM in a deep model structure. Through the cascade of the LSTM cells and the fully-connected feedforward units, we build the deep long short-term memory network which explores the temporal patterns and summarizes the long history of previous inputs in a deep learning machine. In the experiments, different architectures and topologies are investigated by using open-source Kaldi toolkit. The experiments on 3rd CHiME challenge and Aurora-4 show that the stacks of hybrid LSTM and FNN outperform the stand-alone FNN and LSTM and the other hybrid systems for noisy speech recognition.
Chinese Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
English Abstract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Outline of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Automatic Speech Recognition. . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Speech Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Statistical Speech Recognition . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Front-End Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Feature extraction and transformation . . . . . . . . . . . . . 11
2.3.2 Speaker-adaptive features . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Discriminative features . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Acoustic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Gaussian mixture model . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Hidden Markov model . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Forward-backward algorithm . . . . . . . . . . . . . . . . . . . 18
2.4.4 Viterbi decoding . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.5 Expectation-maximization algorithm . . . . . . . . . . . . . . 21
2.5 Language Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Back-off smoothing . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Decoding and Search Algorithm . . . . . . . . . . . . . . . . . . . . . 26
3 Neural Network Acoustic Models . . . . . . . . . . . . . . . . . . . . . . . 28
3.1 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Error backpropagation . . . . . . . . . . . . . . . . . . . . . . 31
3.1.2 Stochastic gradient descent . . . . . . . . . . . . . . . . . . . . 31
3.2 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Restricted Boltzmann machine . . . . . . . . . . . . . . . . . . 32
3.2.2 Pre-training and cross-entropy training . . . . . . . . . . . . . 37
3.2.3 Regularization in deep neural network . . . . . . . . . . . . . 38
3.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Backpropagation through time . . . . . . . . . . . . . . . . . . 42
3.3.2 Bidirectional recurrent neural network . . . . . . . . . . . . . 45
4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 LSTM Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 Gradient calculation . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.2 Network equations . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.3 Parameterization and complexity . . . . . . . . . . . . . . . . . 57
4.2 LSTM Network Variants . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 LSTM recurrent projection network . . . . . . . . . . . . . . . 58
4.2.2 Bidirectional LSTM network . . . . . . . . . . . . . . . . . . . 59
4.3 Joining FNN and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 FNN-LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.2 LSTM-FNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.3 LSTM-FNN-LSTM . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Deep Long Short-Term Memory Network . . . . . . . . . . . . . . . . 63
5 Experiments . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Kaldi Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Speech Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.1 Aurora4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.2 CHiME3 challenge datasets . . . . . . . . . . . . . . . . . . . 71
5.4 Recipes of Speech Recognition Systems . . . . . . . . . . . . . . . . . 71
5.5 Baseline Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . 73
5.6 Deep Neural Network Model . . . . . . . . . . . . . . . . . . . . . . . 74
5.6.1 Evaluation on depth and width of DNN . . . . . . . . . . . . 76
5.7 Long Short-Term Memory Model . . . . . . . . . . . . . . . . . . . . 77
5.7.1 Evaluation on memory cells . . . . . . . . . . . . . . . . . . . 78
5.7.2 Evaluation on hybrid network . . . . . . . . . . . . . . . . . . 79
5.7.3 Evaluation on deep networks . . . . . . . . . . . . . . . . . . . 84
5.7.4 Compiling result from different networks . . . . . . . . . . . . 89
6 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . . . . 96
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Bibliography . . . . . . . . . . . . . . . . . . . . . . . 98
Appendix. . . . . . . . . . . . . . . . . . . . . . . 111
A Word Error Rate and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . 111
A.1 Calculation of Word Error Rate . . . . . . . . . . . . . . . . . . . . . 111
A.2 List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

[1] L. Rabiner, \A tutorial on hidden Markov models and selected applications in
speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257{286, 1989.
[2] K. Fukushima, \Neocognitron: A self-organizing neural network model for a
mechanism of pattern recognition unaected by shift in position," Biological
cybernetics, vol. 36, no. 4, pp. 193{202, 1980.
[3] V. N. Vapnik, Statistical Learning Theory. John Wiley &; Sons, Inc., 2006.
[4] G. Hinton, S. Osindero, and Y.-W. Teh, \A fast learning algorithm for deep
belief nets," Neural computation, vol. 18, no. 7, pp. 1527{1554, 2006.
[5] G. E. Hinton, \Reducing the Dimensionality of Data with Neural Networks,"
Science, vol. 313, pp. 504{507, July 2006.
[6] G. E. Dahl, Dong Yu, Li Deng, and A. Acero, \Context-Dependent Pre-Trained
Deep Neural Networks for Large-Vocabulary Speech Recognition," IEEE Trans-
actions on Audio, Speech, and Language Processing, vol. 20, pp. 30{42, Jan.
2012.
[7] F. Seide, G. Li, and D. Yu, \Conversational speech transcription using contextdependent
deep neural networks," in Proceedings of the Annual Conference of
International Speech Communication Association (INTERSPEECH), pp. 437{
440, 2011.
[8] N. Jaitly, P. Nguyen, A. W. Senior, and V. Vanhoucke, \Application of Pretrained
Deep Neural Networks to Large Vocabulary Speech Recognition.," in Proceedings of the Annual Conference of International Speech Communication
Association (INTERSPEECH), 2012.
[9] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior,
V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, \Deep Neural Networks
for Acoustic Modeling in Speech Recognition: The Shared Views of Four
Research Groups," IEEE Signal Processing Magazine, vol. 29, pp. 82{97, Nov.
2012.
[10] A.-r. Mohamed, G. E. Dahl, and G. Hinton, \Acoustic Modeling Using Deep
Belief Networks," IEEE Transactions on Audio, Speech, and Language Process-
ing, vol. 20, pp. 14{22, Jan. 2012.
[11] J.-T. Chien and T.-W. Lu, \Tikhonov regularization for deep neural network
acoustic modeling," in Proceeding of Spoken Language Technology (SLT),
pp. 147{152, 2014.
[12] A. Graves and J. Schmidhuber, \Oine handwriting recognition with multidimensional
recurrent neural networks," in Advances in Neural Information
Processing Systems, pp. 545{552, 2009.
[13] A. Graves, \Generating sequences with recurrent neural networks," arXiv
preprint arXiv:1308.0850, 2013.
[14] M. Sundermeyer, T. Alkhouli, J. Wuebker, and H. Ney, \Translation modeling
with bidirectional recurrent neural networks," in Proceedings of the Conference
on Empirical Methods on Natural Language Processing, 2014.
[15] C. Weng, D. Yu, S. Watanabe, and B.-H. F. Juang, \Recurrent deep neural networks
for robust speech recognition," in Proceeding of International Conference
on Acoustics, Speech and Signal Processing (ICASSP), pp. 5532{5536, IEEE,
2014.
[16] T. Mikolov, M. Karaat, L. Burget, J. Cernocky, and S. Khudanpur, \Recurrent
neural network based language model," in Proceedings of the Annual Confer-
ence of International Speech Communication Association (INTERSPEECH),
p. 10451048, 2010.
[17] J.-T. Chien and Y.-C. Ku, \Bayesian recurrent neural network language model,"
in Proceeding of Spoken Language Technology (SLT), pp. 206{211, 2014.
[18] R. Pascanu, T. Mikolov, and Y. Bengio, \On the diculty of training recurrent
neural networks," arXiv preprint arXiv:1211.5063, 2012.
[19] J.-T. Chien and T.-W. Lu, \Deep recurrent regularization neural network for
speech recognition," in Proceeding of International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 4560{4564, IEEE, 2015.
[20] S. Hochreiter and J. Schmidhuber, \Long short-term memory," Neural Comput.,
vol. 9, pp. 1735{1780, Nov. 1997.
[21] F. A. Gers and J. Schmidhuber, \Lstm recurrent networks learn simple contextfree
and context-sensitive languages," IEEE Transactions on Neural Networks,
vol. 12, no. 6, pp. 1333{1340, 2001.
[22] S. Hochreiter, M. Heusel, and K. Obermayer, \Fast model-based protein homology
detection without alignment," Bioinformatics, vol. 23, no. 14, pp. 1728{
1736, 2007.
[23] D. Eck and J. Schmidhuber, \Finding temporal structure in music: Blues improvisation
with lstm recurrent networks," in Proceedings of the 2002 12th IEEE
Workshop on Neural Networks for Signal Processing, pp. 747{756, IEEE, 2002.
[24] B. Bakker, \Reinforcement learning with long short-term memory.," in NIPS,
pp. 1475{1482, 2001.
25] A. Graves and J. Schmidhuber, \Framewise phoneme classication with bidirectional
LSTM and other neural network architectures," Neural Networks, vol. 18,
no. 5, pp. 602{610, 2005.
[26] A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, \Connectionist temporal
classication: labelling unsegmented sequence data with recurrent neural
networks," in Proceedings of the 23rd international conference on Machine
learning, pp. 369{376, ACM, 2006.
[27] H. Sak, A. Senior, and F. Beaufays, \Long short-term memory recurrent neural
network architectures for large scale acoustic modeling," in Proceedings of the
Annual Conference of International Speech Communication Association (IN-
TERSPEECH), 2014.
[28] A. Graves, A.-R. Mohamed, and G. Hinton, \Speech recognition with deep
recurrent neural networks," in IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 6645{6649, IEEE, 2013.
[29] J. T. Geiger, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, \Robust speech
recognition using long short-term memory recurrent neural networks for hybrid
acoustic modelling," in Proceedings of the Annual Conference of International
Speech Communication Association (INTERSPEECH), 2014.
[30] M. Liwicki, A. Graves, H. Bunke, and J. Schmidhuber, \A novel approach to
on-line handwriting recognition based on bidirectional long short-term memory
networks," in In Proceedings of the 9th International Conference on Document
Analysis and Recognition (ICDAR), 2007.
[31] A. Graves, M. Liwicki, H. Bunke, J. Schmidhuber, and S. Fernandez, \Unconstrained
on-line handwriting recognition with recurrent neural networks," in
Advances in Neural Information Processing Systems, pp. 577{584, 2008.
[32] J. Baker, L. Deng, J. Glass, S. Khudanpur, Chin-hui Lee, N. Morgan, and
D. O'Shaughnessy, \Developments and directions in speech recognition and understanding, part 1," IEEE Signal Processing Mag., vol. 26, pp. 75{80, May
2009.
[33] B.-H. Juang, \Maximum-likelihood estimation for mixture multivariate stochastic
observations of markov chains," AT&;T Technical Journal, vol. 64, no. 6,
pp. 1235{1249, 1985.
[34] H. Hermansky, \Perceptual linear predictive (plp) analysis of speech," the Jour-
nal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738{1752, 1990.
[35] S. Furui, \Cepstral analysis technique for automatic speaker verication," IEEE
Transactions on Acoustics, Speech and Signal Processing, vol. 29, no. 2, pp. 254{
272, 1981.
[36] S. Young, \A review of large vocabulary continuous speech recognition," IEEE
Signal Processing Magazine, vol. 13, no. 5, p. 45, 1996.
[37] L. Deng, \Computational models for speech production," in Computational
Models of Speech Pattern Processing, pp. 199{213, Springer, 1999.
[38] L. Deng, \Switching dynamic system models for speech articulation and acoustics,"
in Mathematical Foundations of Speech and Language Processing, pp. 115{
133, Springer, 2004.
[39] H. A. Bourlard and N. Morgan, Connectionist speech recognition: a hybrid
approach, vol. 247. Springer Science &; Business Media, 2012.
[40] G. Saon and J.-T. Chien, \Large-vocabulary continuous speech recognition systems:
a look at some recent advances," IEEE Signal Processing Magazine,
vol. 29, no. 6, pp. 18{33, 2012.
[41] D. Yu, L. Deng, and G. E. Dahl, \Roles of pre-training and ne-tuning in
context-dependent DBN-HMMs for real-world speech recognition," in NIPS
2010 workshop on Deep Learning and Unsupervised Feature Learning, December
2010.
[42] S. F. Chen, B. Kingsbury, L. Mangu, D. Povey, G. Saon, H. Soltau, and
G. Zweig, \Advances in speech transcription at IBM under the DARPA EARS
program," IEEE Transactions on Audio, Speech and Language Processing,
vol. 14, no. 5, pp. 1596{1608, 2006.
[43] S. Furui, \Speaker-independent isolated word recognition using dynamic features
of speech spectrum," IEEE Transactions on Acoustics, Speech and Signal
Processing, vol. 34, pp. 52{59, Feb 1986.
[44] G. Saon, M. Padmanabhan, R. A. Gopinath, and S. S. Chen, \Maximum likelihood
discriminant feature spaces," in Proceedings of International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1129{1132, 2000.
[45] M. J. F. Gales, \Maximum likelihood multiple subspace projections for hidden
Markov models," IEEE Transactions on Speech and Audio Processing, vol. 10,
no. 2, pp. 37{47, 2002.
[46] N. Kumar and A. G. Andreou, \Heteroscedastic discriminant analysis and reduced
rank HMMs for improved speech recognition," Speech Communication,
vol. 26, no. 4, pp. 283{297, 1998.
[47] M. Gales, \Maximum likelihood linear transformations for HMM-based speech
recognition," Computer Speech and Language, vol. 12, pp. 75{98, 1998.
[48] R. A. Gopinath, \Maximum likelihood modeling with gaussian distributions for
classication," in Proceedings of International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pp. 661{664, IEEE, 1998.
[49] M. Gales, \Semi-tied covariance matrices for hidden Markov models," IEEE
Transactions on Speech and Audio Processing, vol. 7, pp. 272{81, 1999.
[50] L. Lee and R. Rose, \A frequency warping approach to speaker normalization,"
IEEE Transactions on Speech and Audio Processing, vol. 6, no. 1, pp. 49{60,
1998.
[51] S. Wegmann, D. McAllaster, J. Orlo, and B. Peskin, \Speaker normalization
on conversational telephone speech," in Proceedings of International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 339{341,
IEEE, 1996.
[52] G. Saon, S. Dharanipragada, and D. Povey, \Feature space Gaussianization,"
in Proceedings of International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), vol. 1, pp. 329{332, 2004.
[53] L. Mangu, H.-K. Kuo, S. Chu, B. Kingsbury, G. Saon, H. Soltau, and F. Biadsy,
\The IBM 2011 GALE Arabic speech transcription system," in Proceeding of
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU),
pp. 272{277, 2011.
[54] D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig, \fMPE:
Discriminatively trained features for speech recognition," in Proceedings of In-
ternational Conference on Acoustics, Speech, and Signal Processing (ICASSP),
vol. 1, pp. 961{964, 2005.
[55] D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and
K. Visweswariah, \Boosted MMI for model and feature-space discriminative
training," in Proceedings of International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), pp. 4057{4060, 2008.
[56] H. Hermansky, D. Ellis, and S. Sharma, \Tandem connectionist feature extraction
for conventional hmm systems," in Proceedings of International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 1635{1638,
2000.
[57] F. Grezl, M. Karaat, S. Kontar, and J. Cernocky, \Probabilistic and bottleneck
features for LVCSR of meetings," in Proceedings of International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), pp. 757{760, 2007.
[58] D. Vergyri, A. Mandal, W. Wang, A. Stolcke, J. Zheng, M. Graciarena, D. Rybach,
C. Gollan, R. Schlter, K. Kirchho, A. Faria, and N. Morgan, \Development
of the SRI/nightingale Arabic ASR system," in Proceedings of An-
nual Conference of International Speech Communication Association (INTER-
SPEECH), pp. 1437{1440, 2008.
[59] D. A. Reynolds, \Gaussian Mixture Models," in Encyclopedia of Biometrics,
pp. 659{663, 2009.
[60] M. Gales and S. Young, \The application of hidden Markov models in speech
recognition," Foundations and Trends in Signal Processing, vol. 1, no. 3,
pp. 195{304, 2007.
[61] Y.-P. Chiang, \Releveance vector machine for speech recognition," Master's
thesis, National Cheng Kung University, July 2010.
[62] L. E. Baum, \An inequality and associated maximization technique in statistical
estimation for probabilistic functions of markov processes," in Inequalities III:
Proceedings of the Third Symposium on Inequalities (O. Shisha, ed.), (University
of California, Los Angeles), pp. 1{8, Academic Press, 1972.
[63] G. D. Forney, \The Viterbi algorithm," Proceedings of the IEEE, vol. 61, pp. 268
{ 278, 1973.
[64] J.-L. Gauvain and C.-H. Lee, \Maximum A Posteriori estimation for multivariate
Gaussian mixture observations of Markov chains," IEEE Transactions on
Speech and Audio Processing, vol. 2, pp. 291{298, 1994.
[65] Q. Huo and B. Ma, \Online adaptive learning of continuous-density hidden
Markov models based on multiple-stream prior evolution and posterior pooling,"
IEEE Transactions on Speech and Audio Processing, vol. 9, no. 4, pp. 388{398,
2001.
[66] A. P. Dempster, N. M. Laird, and D. B. Rubin, \Maximum likelihood from
incomplete data via the EM algorithm," Journal of the Royal Statistical Society.
Series B, vol. 39, no. 1, pp. 1{38, 1977.
[67] S. M. Katz, \Estimation of probabilities from sparse data for the language
model component of a speech recognizer," IEEE Transactions on Acoustics,
Speech and Signal Processing, vol. 35, pp. 181{184, 1987.
[68] R. Kneser and H. Ney, \Improved backing-o for m-gram language modeling,"
in Proceeding of International Conference on Acoustic, Speech, and Signal Pro-
cessing (ICASSP), pp. 181{184, 1995.
[69] G. Saon, G. Zweig, B. Kingsbury, L. Mangu, and U. Chaudhari, \An architecture
for rapid decoding of large vocabulary conversational speech," in Proceed-
ings of Annual Conference of International Speech Communication Association
(INTERSPEECH), 2003.
[70] M. Mohri, F. Pereira, and M. Riley, \Weighted nite-state transducers in speech
recognition," Computer Speech and Language, vol. 16, no. 1, pp. 69 { 88, 2002.
[71] S. Kanthak, H. Ney, M. Riley, and M. Mohri, \A comparison of two LVR search
optimization techniques," in Proceedings of International Conference on Spoken
Language Processing (ICSLP), pp. 1309{1312, 2002.
[72] H. Soltau and G. Saon, \Dynamic network decoding revisited," in Proceeding
of IEEE Workshop on Automatic Speech Recognition Understanding (ASRU),
pp. 276{281, 2009.
[73] G. Saon, D. Povey, and G. Zweig, \Anatomy of an extremely fast LVCSR
decoder," in Proceedings of Annual Conference of International Speech Com-
munication Association (INTERSPEECH), pp. 549{552, 2005.
[74] S. F. Chen, \Compiling large-context phonetic decision trees into nite-state
transducers," in Proceedings of Annual Conference of International Speech
Communication Association (INTERSPEECH), pp. 1169{1172, 2003.
[75] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, \Learning internal representations
by error propagation," in Parallel Distributed Processing: Explorations
in the Microstructure of Cognition (D. E. Rumelhart, J. L. McClelland, and
C. PDP Research Group, eds.), vol. 1, pp. 318{362, MIT Press, 1986.
[76] J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Q. V. Le, and A. Y. Ng, \On
optimization methods for deep learning," in Proceedings of International Con-
ference on Machine Learning (ICML), pp. 265{272, 2011.
[77] J. Martens, \Deep learning via Hessian-free optimization," in Proceedings of
the 27th International Conference on Machine Learning (ICML), pp. 735{742,
2010.
[78] N. Qian, \On the momentum term in gradient descent learning algorithms,"
Neural Networks, vol. 12, no. 1, pp. 145{151, 1999.
[79] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \Imagenet classication with
deep convolutional neural networks," in Advances in Neural Information Pro-
cessing Systems (F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.),
vol. 25, pp. 1097{1105, 2012.
[80] O. Vinyals and S. Ravuri, \Comparing multilayer perceptron to deep belief
network tandem features for robust ASR," in Proceedings of International Con-
ference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4596{4599,
2011.
[81] J. J. Hopeld, \Neural networks and physical systems with emergent collective
computational abilities," Proceedings of the National Academy of Sciences of
the United States of America, vol. 79, no. 8, pp. 2554{2558, 1982.
[82] G. E. Hinton, \Training products of experts by minimizing contrastive divergence,"
Neural Computation, vol. 14, no. 8, pp. 1771{1800, 2002.
[83] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University
Press, Inc., 1995.
[84] G. E. Hinton, \A practical guide to training restricted Boltzmann machines,"
in Neural Networks: Tricks of the Trade (2nd ed.), pp. 599{619, 2012.
[85] S. Watanabe and J.-T. Chien, Bayesian Speech and Language Processing. Cambridge
University Press, July 2015.
[86] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov,
\Improving neural networks by preventing co-adaptation of feature detectors,"
The Computing Research Repository (CoRR), vol. abs/1207.0580, 2012.
[87] L. Wan, M. D. Zeiler, S. Zhang, Y. LeCun, and R. Fergus, \Regularization of
neural networks using dropconnect," in Proceding of International Conference
on Machine Learning (ICML), vol. 28, pp. 1058{1066, 2013.
[88] R. J. Williams and D. Zipser, \A learning algorithm for continually running
fully recurrent neural networks," Neural Computation, vol. 1, no. 2, pp. 270{
280, 1989.
[89] T. Catfolis, \A method for improving the real-time recurrent learning algorithm,"
Neural Networks, vol. 6, no. 6, pp. 807 { 821, 1993.
[90] K. Funahashi and Y. Nakamura, \Approximation of dynamical systems by continuous
time recurrent neural networks," Neural Networks, vol. 6, no. 6, pp. 801
{ 806, 1993.
[91] J. L. Elman, \Finding structure in time," Cognitive Science, vol. 14, pp. 179{
211, 1990.
[92] R. J. Williams and D. Zipser, \Gradient-based learning algorithms for recurrent
networks and their computational complexity," 1995.
[93] M. Boden, \A guide to recurrent neural networks and backpropagation," 2001.
[94] Y. Bengio, P. Simard, and P. Frasconi, \Learning long-term dependencies with
gradient descent is dicult," IEEE Transactions on Neural Networks, vol. 5,
no. 2, pp. 157{166, 1994.
[95] M. Schuster and K. K. Paliwal, \Bidirectional recurrent neural networks," IEEE
Transactions on Signal Processing, vol. 45, no. 11, pp. 2673{2681, 1997.
[96] R. J. Williams and J. Peng, \An ecient gradient-based algorithm for on-line
training of recurrent network trajectories," Neural computation, vol. 2, no. 4,
pp. 490{501, 1990.
[97] A. Graves, Supervised sequence labelling with recurrent neural networks,
vol. 385. Springer, 2012.
[98] A. Robinson, F. Fallside, and U. of Cambridge. Engineering Department, The
Utility Driven Dynamic Error Propagation Network. University of Cambridge
Department of Engineering, 1987.
[99] F. Gers, J. Schmidhuber, and F. Cummins, \Learning to forget: continual
prediction with lstm," in Articial Neural Networks, 1999. ICANN 99. Ninth
International Conference on (Conf. Publ. No. 470), vol. 2, pp. 850{855 vol.2,
1999.
[100] H. Sak, A. Senior, and F. Beaufays, \Long short-term memory based recurrent
neural network architectures for large vocabulary speech recognition," arXiv
preprint arXiv:1402.1128, 2014.
[101] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, \Convolutional, long shortterm
memory, fully connected deep neural networks," in Proceeding of Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
[102] M. Hermans and B. Schrauwen, \Training and analysing deep recurrent neural
networks," in Advances in Neural Information Processing Systems, pp. 190{198,
2013.
[103] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann,
P. Motlek, Y. Qian, P. Schwarz, and others, \The Kaldi speech recognition
toolkit," IEEE Workshop on Automatic Speech Recognition and Under-
standing (ASRU), 2011.
[104] N. Parihar and J. Picone, \Aurora working group: DSR front end LVCSR evaluation
AU/384/02," tech. rep., Institute for Signal and Information Processing,
Mississippi State University, 2002.
[105] J. Barker, R. Marxer, E. Vincent, and S. Watanabe, \The third CHiME speech
separation and recognition challenge: dataset, task and baselines," in Proceeding
of Automatic Speech Recognition and Understanding (ASRU), 2015.
[106] X. Glorot and Y. Bengio, \Understanding the diculty of training deep feedforward
neural networks," in Proceedings of the Thirteenth International Con-
ference on Articial Intelligence and Statistics (AISTATS), pp. 249{256, 2010.
[107] T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, \A compact
model for speaker-adaptive training," in Proceedings of International Confer-
ence on Spoken Language Processing (ICSLP), vol. 2, pp. 1137{1140, 1996.
[108] K. Vesely, A. Ghoshal, L. Burget, and D. Povey, \Sequence-discriminative training
of deep neural networks," in Proceedings of Annual Conference of Interna-
tional Speech Communication Association (INTERSPEECH), pp. 2345{2349,
2013.
[109] O. Vinyals and N. Morgan, \Deep vs. Wide: Depth on a budget for robust
speech recognition," in Proceedings of Annual Conference of International
Speech Communication Association (INTERSPEECH), 2013.
[110] H. Sak, O. Vinyals, G. Heigold, A. Senior, E. McDermott, R. Monga, and
M. Mao, \Sequence discriminative distributed training of long short-term memory
recurrent neural networks," in Proceedings of the Annual Conference of
International Speech Communication Association (INTERSPEECH), vol. 15,
pp. 17{18, 2014.
[111] A. Graves, N. Jaitly, and A.-R. Mohamed, \Hybrid speech recognition with deep
bidirectional LSTM," in IEEE Workshop on Automatic Speech Recognition and
Understanding (ASRU), pp. 273{278, IEEE, 2013.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top