|
[1] L. Rabiner, \A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257{286, 1989. [2] K. Fukushima, \Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaected by shift in position," Biological cybernetics, vol. 36, no. 4, pp. 193{202, 1980. [3] V. N. Vapnik, Statistical Learning Theory. John Wiley &; Sons, Inc., 2006. [4] G. Hinton, S. Osindero, and Y.-W. Teh, \A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527{1554, 2006. [5] G. E. Hinton, \Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, pp. 504{507, July 2006. [6] G. E. Dahl, Dong Yu, Li Deng, and A. Acero, \Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition," IEEE Trans- actions on Audio, Speech, and Language Processing, vol. 20, pp. 30{42, Jan. 2012. [7] F. Seide, G. Li, and D. Yu, \Conversational speech transcription using contextdependent deep neural networks," in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437{ 440, 2011. [8] N. Jaitly, P. Nguyen, A. W. Senior, and V. Vanhoucke, \Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition.," in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), 2012. [9] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, \Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," IEEE Signal Processing Magazine, vol. 29, pp. 82{97, Nov. 2012. [10] A.-r. Mohamed, G. E. Dahl, and G. Hinton, \Acoustic Modeling Using Deep Belief Networks," IEEE Transactions on Audio, Speech, and Language Process- ing, vol. 20, pp. 14{22, Jan. 2012. [11] J.-T. Chien and T.-W. Lu, \Tikhonov regularization for deep neural network acoustic modeling," in Proceeding of Spoken Language Technology (SLT), pp. 147{152, 2014. [12] A. Graves and J. Schmidhuber, \Oine handwriting recognition with multidimensional recurrent neural networks," in Advances in Neural Information Processing Systems, pp. 545{552, 2009. [13] A. Graves, \Generating sequences with recurrent neural networks," arXiv preprint arXiv:1308.0850, 2013. [14] M. Sundermeyer, T. Alkhouli, J. Wuebker, and H. Ney, \Translation modeling with bidirectional recurrent neural networks," in Proceedings of the Conference on Empirical Methods on Natural Language Processing, 2014. [15] C. Weng, D. Yu, S. Watanabe, and B.-H. F. Juang, \Recurrent deep neural networks for robust speech recognition," in Proceeding of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5532{5536, IEEE, 2014. [16] T. Mikolov, M. Karaat, L. Burget, J. Cernocky, and S. Khudanpur, \Recurrent neural network based language model," in Proceedings of the Annual Confer- ence of International Speech Communication Association (INTERSPEECH), p. 10451048, 2010. [17] J.-T. Chien and Y.-C. Ku, \Bayesian recurrent neural network language model," in Proceeding of Spoken Language Technology (SLT), pp. 206{211, 2014. [18] R. Pascanu, T. Mikolov, and Y. Bengio, \On the diculty of training recurrent neural networks," arXiv preprint arXiv:1211.5063, 2012. [19] J.-T. Chien and T.-W. Lu, \Deep recurrent regularization neural network for speech recognition," in Proceeding of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4560{4564, IEEE, 2015. [20] S. Hochreiter and J. Schmidhuber, \Long short-term memory," Neural Comput., vol. 9, pp. 1735{1780, Nov. 1997. [21] F. A. Gers and J. Schmidhuber, \Lstm recurrent networks learn simple contextfree and context-sensitive languages," IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1333{1340, 2001. [22] S. Hochreiter, M. Heusel, and K. Obermayer, \Fast model-based protein homology detection without alignment," Bioinformatics, vol. 23, no. 14, pp. 1728{ 1736, 2007. [23] D. Eck and J. Schmidhuber, \Finding temporal structure in music: Blues improvisation with lstm recurrent networks," in Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 747{756, IEEE, 2002. [24] B. Bakker, \Reinforcement learning with long short-term memory.," in NIPS, pp. 1475{1482, 2001. 25] A. Graves and J. Schmidhuber, \Framewise phoneme classication with bidirectional LSTM and other neural network architectures," Neural Networks, vol. 18, no. 5, pp. 602{610, 2005. [26] A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, \Connectionist temporal classication: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning, pp. 369{376, ACM, 2006. [27] H. Sak, A. Senior, and F. Beaufays, \Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proceedings of the Annual Conference of International Speech Communication Association (IN- TERSPEECH), 2014. [28] A. Graves, A.-R. Mohamed, and G. Hinton, \Speech recognition with deep recurrent neural networks," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645{6649, IEEE, 2013. [29] J. T. Geiger, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, \Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling," in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), 2014. [30] M. Liwicki, A. Graves, H. Bunke, and J. Schmidhuber, \A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks," in In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR), 2007. [31] A. Graves, M. Liwicki, H. Bunke, J. Schmidhuber, and S. Fernandez, \Unconstrained on-line handwriting recognition with recurrent neural networks," in Advances in Neural Information Processing Systems, pp. 577{584, 2008. [32] J. Baker, L. Deng, J. Glass, S. Khudanpur, Chin-hui Lee, N. Morgan, and D. O'Shaughnessy, \Developments and directions in speech recognition and understanding, part 1," IEEE Signal Processing Mag., vol. 26, pp. 75{80, May 2009. [33] B.-H. Juang, \Maximum-likelihood estimation for mixture multivariate stochastic observations of markov chains," AT&;T Technical Journal, vol. 64, no. 6, pp. 1235{1249, 1985. [34] H. Hermansky, \Perceptual linear predictive (plp) analysis of speech," the Jour- nal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738{1752, 1990. [35] S. Furui, \Cepstral analysis technique for automatic speaker verication," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 29, no. 2, pp. 254{ 272, 1981. [36] S. Young, \A review of large vocabulary continuous speech recognition," IEEE Signal Processing Magazine, vol. 13, no. 5, p. 45, 1996. [37] L. Deng, \Computational models for speech production," in Computational Models of Speech Pattern Processing, pp. 199{213, Springer, 1999. [38] L. Deng, \Switching dynamic system models for speech articulation and acoustics," in Mathematical Foundations of Speech and Language Processing, pp. 115{ 133, Springer, 2004. [39] H. A. Bourlard and N. Morgan, Connectionist speech recognition: a hybrid approach, vol. 247. Springer Science &; Business Media, 2012. [40] G. Saon and J.-T. Chien, \Large-vocabulary continuous speech recognition systems: a look at some recent advances," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 18{33, 2012. [41] D. Yu, L. Deng, and G. E. Dahl, \Roles of pre-training and ne-tuning in context-dependent DBN-HMMs for real-world speech recognition," in NIPS 2010 workshop on Deep Learning and Unsupervised Feature Learning, December 2010. [42] S. F. Chen, B. Kingsbury, L. Mangu, D. Povey, G. Saon, H. Soltau, and G. Zweig, \Advances in speech transcription at IBM under the DARPA EARS program," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1596{1608, 2006. [43] S. Furui, \Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, pp. 52{59, Feb 1986. [44] G. Saon, M. Padmanabhan, R. A. Gopinath, and S. S. Chen, \Maximum likelihood discriminant feature spaces," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1129{1132, 2000. [45] M. J. F. Gales, \Maximum likelihood multiple subspace projections for hidden Markov models," IEEE Transactions on Speech and Audio Processing, vol. 10, no. 2, pp. 37{47, 2002. [46] N. Kumar and A. G. Andreou, \Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition," Speech Communication, vol. 26, no. 4, pp. 283{297, 1998. [47] M. Gales, \Maximum likelihood linear transformations for HMM-based speech recognition," Computer Speech and Language, vol. 12, pp. 75{98, 1998. [48] R. A. Gopinath, \Maximum likelihood modeling with gaussian distributions for classication," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 661{664, IEEE, 1998. [49] M. Gales, \Semi-tied covariance matrices for hidden Markov models," IEEE Transactions on Speech and Audio Processing, vol. 7, pp. 272{81, 1999. [50] L. Lee and R. Rose, \A frequency warping approach to speaker normalization," IEEE Transactions on Speech and Audio Processing, vol. 6, no. 1, pp. 49{60, 1998. [51] S. Wegmann, D. McAllaster, J. Orlo, and B. Peskin, \Speaker normalization on conversational telephone speech," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 339{341, IEEE, 1996. [52] G. Saon, S. Dharanipragada, and D. Povey, \Feature space Gaussianization," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 329{332, 2004. [53] L. Mangu, H.-K. Kuo, S. Chu, B. Kingsbury, G. Saon, H. Soltau, and F. Biadsy, \The IBM 2011 GALE Arabic speech transcription system," in Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 272{277, 2011. [54] D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig, \fMPE: Discriminatively trained features for speech recognition," in Proceedings of In- ternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 961{964, 2005. [55] D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah, \Boosted MMI for model and feature-space discriminative training," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4057{4060, 2008. [56] H. Hermansky, D. Ellis, and S. Sharma, \Tandem connectionist feature extraction for conventional hmm systems," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 1635{1638, 2000. [57] F. Grezl, M. Karaat, S. Kontar, and J. Cernocky, \Probabilistic and bottleneck features for LVCSR of meetings," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 757{760, 2007. [58] D. Vergyri, A. Mandal, W. Wang, A. Stolcke, J. Zheng, M. Graciarena, D. Rybach, C. Gollan, R. Schlter, K. Kirchho, A. Faria, and N. Morgan, \Development of the SRI/nightingale Arabic ASR system," in Proceedings of An- nual Conference of International Speech Communication Association (INTER- SPEECH), pp. 1437{1440, 2008. [59] D. A. Reynolds, \Gaussian Mixture Models," in Encyclopedia of Biometrics, pp. 659{663, 2009. [60] M. Gales and S. Young, \The application of hidden Markov models in speech recognition," Foundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195{304, 2007. [61] Y.-P. Chiang, \Releveance vector machine for speech recognition," Master's thesis, National Cheng Kung University, July 2010. [62] L. E. Baum, \An inequality and associated maximization technique in statistical estimation for probabilistic functions of markov processes," in Inequalities III: Proceedings of the Third Symposium on Inequalities (O. Shisha, ed.), (University of California, Los Angeles), pp. 1{8, Academic Press, 1972. [63] G. D. Forney, \The Viterbi algorithm," Proceedings of the IEEE, vol. 61, pp. 268 { 278, 1973. [64] J.-L. Gauvain and C.-H. Lee, \Maximum A Posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 291{298, 1994. [65] Q. Huo and B. Ma, \Online adaptive learning of continuous-density hidden Markov models based on multiple-stream prior evolution and posterior pooling," IEEE Transactions on Speech and Audio Processing, vol. 9, no. 4, pp. 388{398, 2001. [66] A. P. Dempster, N. M. Laird, and D. B. Rubin, \Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society. Series B, vol. 39, no. 1, pp. 1{38, 1977. [67] S. M. Katz, \Estimation of probabilities from sparse data for the language model component of a speech recognizer," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 35, pp. 181{184, 1987. [68] R. Kneser and H. Ney, \Improved backing-o for m-gram language modeling," in Proceeding of International Conference on Acoustic, Speech, and Signal Pro- cessing (ICASSP), pp. 181{184, 1995. [69] G. Saon, G. Zweig, B. Kingsbury, L. Mangu, and U. Chaudhari, \An architecture for rapid decoding of large vocabulary conversational speech," in Proceed- ings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2003. [70] M. Mohri, F. Pereira, and M. Riley, \Weighted nite-state transducers in speech recognition," Computer Speech and Language, vol. 16, no. 1, pp. 69 { 88, 2002. [71] S. Kanthak, H. Ney, M. Riley, and M. Mohri, \A comparison of two LVR search optimization techniques," in Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 1309{1312, 2002. [72] H. Soltau and G. Saon, \Dynamic network decoding revisited," in Proceeding of IEEE Workshop on Automatic Speech Recognition Understanding (ASRU), pp. 276{281, 2009. [73] G. Saon, D. Povey, and G. Zweig, \Anatomy of an extremely fast LVCSR decoder," in Proceedings of Annual Conference of International Speech Com- munication Association (INTERSPEECH), pp. 549{552, 2005. [74] S. F. Chen, \Compiling large-context phonetic decision trees into nite-state transducers," in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 1169{1172, 2003. [75] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, \Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition (D. E. Rumelhart, J. L. McClelland, and C. PDP Research Group, eds.), vol. 1, pp. 318{362, MIT Press, 1986. [76] J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Q. V. Le, and A. Y. Ng, \On optimization methods for deep learning," in Proceedings of International Con- ference on Machine Learning (ICML), pp. 265{272, 2011. [77] J. Martens, \Deep learning via Hessian-free optimization," in Proceedings of the 27th International Conference on Machine Learning (ICML), pp. 735{742, 2010. [78] N. Qian, \On the momentum term in gradient descent learning algorithms," Neural Networks, vol. 12, no. 1, pp. 145{151, 1999. [79] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \Imagenet classication with deep convolutional neural networks," in Advances in Neural Information Pro- cessing Systems (F. Pereira, C. Burges, L. Bottou, and K. Weinberger, eds.), vol. 25, pp. 1097{1105, 2012. [80] O. Vinyals and S. Ravuri, \Comparing multilayer perceptron to deep belief network tandem features for robust ASR," in Proceedings of International Con- ference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4596{4599, 2011. [81] J. J. Hopeld, \Neural networks and physical systems with emergent collective computational abilities," Proceedings of the National Academy of Sciences of the United States of America, vol. 79, no. 8, pp. 2554{2558, 1982. [82] G. E. Hinton, \Training products of experts by minimizing contrastive divergence," Neural Computation, vol. 14, no. 8, pp. 1771{1800, 2002. [83] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, Inc., 1995. [84] G. E. Hinton, \A practical guide to training restricted Boltzmann machines," in Neural Networks: Tricks of the Trade (2nd ed.), pp. 599{619, 2012. [85] S. Watanabe and J.-T. Chien, Bayesian Speech and Language Processing. Cambridge University Press, July 2015. [86] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, \Improving neural networks by preventing co-adaptation of feature detectors," The Computing Research Repository (CoRR), vol. abs/1207.0580, 2012. [87] L. Wan, M. D. Zeiler, S. Zhang, Y. LeCun, and R. Fergus, \Regularization of neural networks using dropconnect," in Proceding of International Conference on Machine Learning (ICML), vol. 28, pp. 1058{1066, 2013. [88] R. J. Williams and D. Zipser, \A learning algorithm for continually running fully recurrent neural networks," Neural Computation, vol. 1, no. 2, pp. 270{ 280, 1989. [89] T. Catfolis, \A method for improving the real-time recurrent learning algorithm," Neural Networks, vol. 6, no. 6, pp. 807 { 821, 1993. [90] K. Funahashi and Y. Nakamura, \Approximation of dynamical systems by continuous time recurrent neural networks," Neural Networks, vol. 6, no. 6, pp. 801 { 806, 1993. [91] J. L. Elman, \Finding structure in time," Cognitive Science, vol. 14, pp. 179{ 211, 1990. [92] R. J. Williams and D. Zipser, \Gradient-based learning algorithms for recurrent networks and their computational complexity," 1995. [93] M. Boden, \A guide to recurrent neural networks and backpropagation," 2001. [94] Y. Bengio, P. Simard, and P. Frasconi, \Learning long-term dependencies with gradient descent is dicult," IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157{166, 1994. [95] M. Schuster and K. K. Paliwal, \Bidirectional recurrent neural networks," IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673{2681, 1997. [96] R. J. Williams and J. Peng, \An ecient gradient-based algorithm for on-line training of recurrent network trajectories," Neural computation, vol. 2, no. 4, pp. 490{501, 1990. [97] A. Graves, Supervised sequence labelling with recurrent neural networks, vol. 385. Springer, 2012. [98] A. Robinson, F. Fallside, and U. of Cambridge. Engineering Department, The Utility Driven Dynamic Error Propagation Network. University of Cambridge Department of Engineering, 1987. [99] F. Gers, J. Schmidhuber, and F. Cummins, \Learning to forget: continual prediction with lstm," in Articial Neural Networks, 1999. ICANN 99. Ninth International Conference on (Conf. Publ. No. 470), vol. 2, pp. 850{855 vol.2, 1999. [100] H. Sak, A. Senior, and F. Beaufays, \Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," arXiv preprint arXiv:1402.1128, 2014. [101] T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, \Convolutional, long shortterm memory, fully connected deep neural networks," in Proceeding of Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015. [102] M. Hermans and B. Schrauwen, \Training and analysing deep recurrent neural networks," in Advances in Neural Information Processing Systems, pp. 190{198, 2013. [103] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlek, Y. Qian, P. Schwarz, and others, \The Kaldi speech recognition toolkit," IEEE Workshop on Automatic Speech Recognition and Under- standing (ASRU), 2011. [104] N. Parihar and J. Picone, \Aurora working group: DSR front end LVCSR evaluation AU/384/02," tech. rep., Institute for Signal and Information Processing, Mississippi State University, 2002. [105] J. Barker, R. Marxer, E. Vincent, and S. Watanabe, \The third CHiME speech separation and recognition challenge: dataset, task and baselines," in Proceeding of Automatic Speech Recognition and Understanding (ASRU), 2015. [106] X. Glorot and Y. Bengio, \Understanding the diculty of training deep feedforward neural networks," in Proceedings of the Thirteenth International Con- ference on Articial Intelligence and Statistics (AISTATS), pp. 249{256, 2010. [107] T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, \A compact model for speaker-adaptive training," in Proceedings of International Confer- ence on Spoken Language Processing (ICSLP), vol. 2, pp. 1137{1140, 1996. [108] K. Vesely, A. Ghoshal, L. Burget, and D. Povey, \Sequence-discriminative training of deep neural networks," in Proceedings of Annual Conference of Interna- tional Speech Communication Association (INTERSPEECH), pp. 2345{2349, 2013. [109] O. Vinyals and N. Morgan, \Deep vs. Wide: Depth on a budget for robust speech recognition," in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2013. [110] H. Sak, O. Vinyals, G. Heigold, A. Senior, E. McDermott, R. Monga, and M. Mao, \Sequence discriminative distributed training of long short-term memory recurrent neural networks," in Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), vol. 15, pp. 17{18, 2014. [111] A. Graves, N. Jaitly, and A.-R. Mohamed, \Hybrid speech recognition with deep bidirectional LSTM," in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273{278, IEEE, 2013.
|