|
[1] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, “Phoneme Recognition Using Time-Delay Neural Networks,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328–339, 1989. [2] V. Peddinti, D. Povey, and S. Khudanpur, “A Time Delay Neural Network Architecture for Efficient Modeling of Long Temporal Contexts,” in Proc. Interspeech, 2015. [3] H. Zheng, Z. Yang, L. Qiao, J. Li, and W. Liu, “Attribute Knowledge Integration for Speech Recognition Based on Multi-task Learning Neural Networks,” in Proc. Interspeech, 2015. [4] G. Hinton, L. Deng, D. Yu, G. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012. [5] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proc. ICML, 2006. [6] H. Andrew Senior, Sak, F. de Chaumont Quitry, T. Sainath, K. Rao et al., “Acoustic modelling with cd-ctc-smbr lstm rnns,” in Proc. IEEE ASRU, 2015. [7] J. Li, G. Ye, A. Das, R. Zhao, and Y. Gong, “Advancing acoustic-to-word ctc model,” in Proc. IEEE ICASSP, 2018. [8] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, “The Kaldi Speech Recognition Toolkit,” in Proc. IEEE ASRU, 2011. [9] 袁家樺等, 《漢語方言概要》. 語文出版社, 1960. [10] S. King and P. Taylor, “Detection of Phonological Features in Continuous Speech Using Neural Networks,” Computer Speech and Language, vol. 14, no. 4, pp. 333–353, 2000. [11] C.-H. Lee, M. A. Clements, S. Dusan, E. Fosler-Lussier, K. Johnson, B.-H. Juang, and L. R. Rabiner, “An Overview on Automatic Speech Attribute Transcription (ASAT),” in Proc. Interspeech, 2007. [12] C. Zhang, Y. Liu, and C. H. Lee, “Detection-based Accented Speech Recognition Using Articulatory Features,” in Proc. IEEE ASRU, 2011. [13] I. Bromberg, Q. Fu, J. Hou, J. Li, C. Ma, B. Matthews, A. Moreno-daniel, J. Morris, S. M. Siniscalchi, Y. Tsao, and Y. Wang, “Detection-Based ASR in the Automatic Speech Attribute Transcription Project,” in Proc. Interspeech, 2007. [14] D. Yu, S. M. Siniscalchi, L. Deng, and C.-h. Lee, “Boosting Attribute and Phone Estimation Accuracies with Deep Neural Networks for Detection-based Speech Recognition,” in Proc. ICASSP, 2012. [15] S. M. Siniscalchi, J. Li, and C.-H. Lee, “A Study on Lattice Rescoring with Knowledge Scores for Automatic Speech Recognition,” in Proc. Interspeech, 2006. [16] W. Li, S. M. Siniscalchi, N. F. Chen, and C.-H. Lee, “Improving Non-native Mispronunciation Detection and Enriching Diagnostic Feedback with DNN-based Speech Attribute Modeling,” in Proc. ICASSP, 2016. [17] R. Duan, T. Kawahara, M. Dantsuji, and J. Zhang, “Pronunciation Error Detection using DNN Articulatory Model Based on Multi-lingual and Multi-task Learning,” in Proc. ISCSLP, 2016. [18] R. Duan, T. Kawahara, M. Dantsuji, and H. Nanjo, “Efficient Learning of Articulatory Models Based on Multi-Label Training and Label Correction for Pronunciation Learning,” in Proc. ICASSP, 2018. [19] R. A. Caruana, “Multitask Learning: A Knowledge-Based Source of Inductive Bias,” in Proc. ICML, 1993. [20] T. Evgeniou and M. Pontil, “Regularized Multi-task Learning,” in Proc. ACM SIGKDD, 2004. [21] P. Kenny, “Joint factor analysis of speaker and session variability: Theory and algorithms,” CRIM, Montreal,(Report) CRIM-06/08-13, vol. 14, pp. 28–29, 2005. [22] I. P. Association and Others, Handbook of the International Phonetic Association: A Guide to theUuse of the International Phonetic Alphabet. Cambridge University Press, 1999. [23] H.-M. Wang, B. Chen, J.-W. Kuo, and S.-S. Cheng, “MATBN: A Mandarin Chinese Broadcast News Corpus,” International Journal of Computational Linguistics & Chinese Language Processing, vol. 10, no. 2, pp. 219–236, 2005. [24] S. Broman and M. Kurimo, “Methods for Combining Language Models in Speech Recognition,” in Proc. Interspeech, 2005. [25] D. Povey and K. Vesel, “Sequence-discriminative Training of Deep Neural Networks,” in Proc. Interspeech, 2013. [26] T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, “Audio Augmentation for Speech Recognition,” in Proc. Interspeech, 2015.
|