跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.103) 您好!臺灣時間:2025/11/22 04:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊恕先
研究生(外文):Shu-Sian Yang
論文名稱:基於卷積神經網路之語音辨識
論文名稱(外文):Speech Recognition by Using Convolutional Neural Network
指導教授:莊堯棠
學位類別:碩士
校院名稱:國立中央大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:64
中文關鍵詞:語音辨識深度學習神經網路
外文關鍵詞:speech recognitiondeep learningneural network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:655
  • 評分評分:
  • 下載下載:36
  • 收藏至我的研究室書目清單書目收藏:0
本論文在探討如何利用深度學習來進行語音辨識,而使用的辨識方法是先透過梅爾倒頻譜係數((Mel frequency cepstral coefficients, MFCCs)取得語音特徵參數,並輸入卷積神經網路(Convolutional Neural Network, CNN)進行語音辨識。
此法與傳統語音辨識方法最大不同是在於不需要建立聲學模型,以中文為例就省去建立大量聲母(consonant)、韻母(vowel)比對的時間。藉由透過MFCCs取得特徵參數後就可以透過卷積神經網路實現語音辨識,並且不會受到語言種類的限制。
The thesis developed a speech recognition method for automatic speech recognition. In this speech recognition method, we obtained the speech feature parameters through Mel frequency cepstral coefficients and input a Convolutional Neural Network. The main difference between this Convolutional Neural Network speech recognition method and traditional speech recognition method is that it does not need to establish an acoustic model. For example, in Chinese, it saved a lot of time without establishing a large number of consonant and vowel models. After obtaining the speech feature parameters through the MFCCs, speech recognition is finished through Convolutional Neural Network.
摘要 I
Abstract II
致謝辭 III
目錄 IV
圖目錄 VI
表目錄 VIII
第一章 緒論 1
1-1 研究動機 1
1-2 文獻回顧 2
1-3 章節架構 4
第二章 語音辨識 5
2-1 前處理 6
第三章 卷積神經網路 15
3-1 卷積神經網路架構 15
3-1-1 卷積層 16
3-1-2 池化層 18
3-1-3 全連接層 21
3-2 激活函數 22
3-3 權重更新 25
3-1-1 隨機梯度下降法(Stochastic gradient descent, SGD)
26
3-1-2 AdaGrad 27
3-1-3 Adam 28
第四章 實驗結果 34
4-1 卷積神經網路深度對辨識的影響 37
4-2 激活函數對辨識的影響 39
4-3 權重更新對辨識的影響 41
4-4 神經網路優化方式 43
第五章 結論與未來研究方向 44
5-1 結論 44
5-2 未來研究方向 46
參考文獻 47
[1] Anjali, A. Kumar and N. Birla, Voice Command Recognition System based on MFCC and DTW, International Journal of Engineering Science and Technology, 2(12),2010.

[2] A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, and M. Picheny, “Deep belief networks using discriminative features for phone recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2011, pp. 5060–5063

[3] B.Y. Chen, Q. Zhu, and N. Morgan, “A Neural Network for Learning Long-Term Temporal Features for Speech Recognition,” Proc. ICASSP 2005, March 2005, pp. 945-948

[4] Corneliu Octavian Dumitru, Inge Gavat, “A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language,” International Symphosium ELMAR, 07-09 June, 2006, Zadar, Croatia

[5] C. Poonkuzhali, R. Karthiprakash, S. Valarmathy and M. Kalamani, An Approach to feature selection algorithm based on Ant Colony Optimization for Automatic Speech Recognition, International journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 11(2), and 2013.

[6] C. Ittichaichareon, S. Suksri and T. Yingthawornsuk, speech Recognition using MFCC, International Conference on Computer Graphics Simulation and Modeling, 2012.

[7] C. Kim and R. M. Stern, “Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring”, in Proc. ICASSP, pp. 4574–4577, 2010.

[8] C. Charbuillet, B. Gas, M. Chetouani and J. L. Zarader, "Complementary features for speaker verification based on genetic algorithms," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4 2007 pp. IV-285 - IV-288.

[9] D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, “Feature learning in deep neural networks - studies on speech recognition tasks,” in Proc. Int. Conf. Learn. Represent., 2013.

[10]Diederik P. Kingma and Jimmy Lei Ba “A METHOD FOR
STOCHASTIC OPTIMIZATION” ICLR 2015.

[11] D.C.Cire¸san, U. Meier, J. Masci, L.M. Gambardella, and J. Schmidhuber. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.

[12] D. Cire¸san, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.

[13] E. Bocchieri and D. Dimitriadis “Investigating deep neural network based transforms of robust audio features for LVCSR” in Proc. ICASSP, pp. 6709–6713, 2013.

[14] F. Seide, G. Li, X. Chen, and D. Yu, “Feature engineering in context-dependent deep neural networks for conversational speech transcription,” in Proc. IEEE Workshop Autom. Speech Recognition Understand. (ASRU), 2011, pp. 24–29.

[15] F. Seide, G. Li, and D. Yu, “Conversational speech transcription
using context-dependent deep neural networks,” in Proc. Interspeech, 2011, pp. 437–440.

[16] H. Lee, P. Pham, Y. Largman, and A. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” in Proc. Adv. Neural Inf. Process. Syst. 22, 2009, pp. 1096–1104.

[17] H. Franco, M. Graciarena, and A. Mandal, “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition”, Proc. ICASSP 2012, pp. 4117-4120, March 2012

[18] J. Chen , K. K. Paliwal, M. Mizumachi and S. Nakamura, “Robust mfccs derived from differentiated power spectrum” Eurospeech 2001, Scandinavia, 2001.

[19] J.C.Wang,J.F.Wang,Y.S.Weng, “Chip design of MFCC extraction for speech recognition Volume 32 ,“ Issues 1–2, pp. 111-131, November 2002.

[20] L. Muda, M. Begam and I. Elamvazuthi, Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping(DTW) Techniques, Journal of Computing, 3(2),2010

[21] L. Deng, K. Hassanein, and M. Elmasry, “Analysis of correlation structure for a neural predictive model with applications to speech recognition,” Neural Netw., vol. 7, no. 2, pp. 331–339, 1994.

[22]L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1060–1089, May 2013.

[23]M.A.Anusuya and S.K.Katti, “Speech Recognition by Machine: A Review”, (IJCSIS) International Journal of Computer Science and Information Security, vol. 6, no. 3, pp. 181-205, 2009.

[24]M. Kleinschmidt, “Localized spectro-temporal features for automatic speech recognition,” in Proc. of Eurospeech, 2003, Sep 2003, pp. 2573–2576.

[25]N. Morgan, “Deep and wide: Multiple layers in automatic speech recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 7–13, Jan. 2012.

[26]Ossama Abdel-Hamid, Li Deng and Dong Yu, “Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition, “ Interspeech, pp. 3366-3370, August 2013.

[27]Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition, “IEEE/ACM Transaction On Audio, Speech, and Language Processing, Vol. 22, No. 10, October 2014.

[28]Ovidiu Buza1, Gavril Toderean1, Alina Nica1, Alexandru Caruntu1, “Voice Signal Processing For Speech Synthesis,” IEEE International Conference on Automation, Quality and Testing Robotics, Vol. 2, pp. 360-364, 25-28 May 2006.

[29]Parwinder Pal Singh and Pushpa Rani, “An Approach to Extract Feature using MFCC,” International organization of Scientific Research, Volume .04,pp.21-25, August 2014.

[30]P. C. Woodland and D. Povey, “Large scale discriminative training of
hidden Markov models for speech recognition,” Computer Speech
and Language, vol. 16, no. 1, pp. 25–47, 2002.

[31]Q. Zhu, B. Chen, N. Morgan, and A. Stolcke, “Tandem connectionist feature extraction for conversational speech recognition,” in Machine Learning for Multimodal Interaction. Berlin/Heidelberg, Germany: Springer , 2005, vol. 3361, pp. 223–231.

[32]Rajesh Kumar Aggarwal and M. Dave, “Acoustic modeling problem for automatic speech recognition system: advances and refinements Part (Part II)”, Int J Speech Technol, pp. 309– 320, 2011.

[33]Shuo-Yiin Chang and Nelson Morgan, “Robust CNN-based Speech Recognition With Gabor Filter Kernels, “ Interspeech, pp. 905-909, September 2014.

[34]Sheeraz Memon, Margaret Lech and Ling He, "Using information theoretic vector quantization for inverted mfcc based speaker verification," 2nd International Conference on Computer, Control and Communication, 2009. IC4 2009, pp. 1 – 5.

[35]S. Witt and S. Young, “Phone-level pronunciation scoring and
assessment for interactive language learning,” Speech
Communication, vol. 30, no. 2–3, pp. 95–108, 2000.

[36]S. Dhingra, G. Nijhawan and P. Pandit, Isolated Speech Recognition using MFCC and DTW, International journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2013.

[37] S. Chakroborty and S. Goutam, “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.

[38]S.Y. Chang, N. Morgan “Informative spectro-temporal bottleneck features for noise-robust speech recognition”, Proc. Interspeech 2013

[39]T. Landauer, C. Kamm, and S. Singhal, “Learning a minimally structured back propagation network to recognize speech,” in Proc. 9th Annu. Conf. Cogn. Sci. Soc., 1987, pp. 531–536.

[40]W. Han, C. F. Chan, C. S. Choy and K. P. Pun, “An Efficient MFCC
Extraction Method in Speech Recognition,” International Symposium on Circuits and Systems, pp. 21-24, 2006.

[41]Wang Chen, Miao Zhenjiang and Meng Xiao, "Comparison of different implementations of mfcc," J. Computer Science & Technology, 2001, pp. 16(16): 582-589.

[42]Wang Chen, Miao Zhenjiang and Meng Xiao, "Differential mfcc and vector quantization used for real-time speaker recognition system," Congress on Image and Signal Processing, 2008, pp. 319 - 323.

[43]Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time-series,” in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA, USA: MIT Press, 1995.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊