臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.103) 您好！臺灣時間：2025/11/22 04:52

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

楊恕先

研究生(外文):

Shu-Sian Yang

論文名稱:

基於卷積神經網路之語音辨識

論文名稱(外文):

Speech Recognition by Using Convolutional Neural Network

指導教授:

莊堯棠

學位類別:

碩士

校院名稱:

國立中央大學

系所名稱:

電機工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2019

畢業學年度:

107

語文別:

中文

論文頁數:

中文關鍵詞:

語音辨識、深度學習、神經網路

外文關鍵詞:

speech recognition、deep learning、neural network

相關次數:

被引用:0
點閱:655
評分:
下載:36
書目收藏:0

本論文在探討如何利用深度學習來進行語音辨識，而使用的辨識方法是先透過梅爾倒頻譜係數((Mel frequency cepstral coefficients, MFCCs)取得語音特徵參數，並輸入卷積神經網路(Convolutional Neural Network, CNN)進行語音辨識。
此法與傳統語音辨識方法最大不同是在於不需要建立聲學模型，以中文為例就省去建立大量聲母(consonant)、韻母(vowel)比對的時間。藉由透過MFCCs取得特徵參數後就可以透過卷積神經網路實現語音辨識，並且不會受到語言種類的限制。

The thesis developed a speech recognition method for automatic speech recognition. In this speech recognition method, we obtained the speech feature parameters through Mel frequency cepstral coefficients and input a Convolutional Neural Network. The main difference between this Convolutional Neural Network speech recognition method and traditional speech recognition method is that it does not need to establish an acoustic model. For example, in Chinese, it saved a lot of time without establishing a large number of consonant and vowel models. After obtaining the speech feature parameters through the MFCCs, speech recognition is finished through Convolutional Neural Network.

摘要 I
Abstract II
致謝辭 III
目錄 IV
圖目錄 VI
表目錄 VIII
第一章緒論 1
1-1 研究動機 1
1-2 文獻回顧 2
1-3 章節架構 4
第二章語音辨識 5
2-1 前處理 6
第三章卷積神經網路 15
3-1 卷積神經網路架構 15
3-1-1 卷積層 16
3-1-2 池化層 18
3-1-3 全連接層 21
3-2 激活函數 22
3-3 權重更新 25
3-1-1 隨機梯度下降法(Stochastic gradient descent, SGD)
26
3-1-2 AdaGrad 27
3-1-3 Adam 28
第四章實驗結果 34
4-1 卷積神經網路深度對辨識的影響 37
4-2 激活函數對辨識的影響 39
4-3 權重更新對辨識的影響 41
4-4 神經網路優化方式 43
第五章結論與未來研究方向 44
5-1 結論 44
5-2 未來研究方向 46
參考文獻 47

[1] Anjali, A. Kumar and N. Birla, Voice Command Recognition System based on MFCC and DTW, International Journal of Engineering Science and Technology, 2(12),2010.

[2] A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, and M. Picheny, “Deep belief networks using discriminative features for phone recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), May 2011, pp. 5060–5063

[3] B.Y. Chen, Q. Zhu, and N. Morgan, “A Neural Network for Learning Long-Term Temporal Features for Speech Recognition,” Proc. ICASSP 2005, March 2005, pp. 945-948

[4] Corneliu Octavian Dumitru, Inge Gavat, “A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language,” International Symphosium ELMAR, 07-09 June, 2006, Zadar, Croatia

[5] C. Poonkuzhali, R. Karthiprakash, S. Valarmathy and M. Kalamani, An Approach to feature selection algorithm based on Ant Colony Optimization for Automatic Speech Recognition, International journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 11(2), and 2013.

[6] C. Ittichaichareon, S. Suksri and T. Yingthawornsuk, speech Recognition using MFCC, International Conference on Computer Graphics Simulation and Modeling, 2012.

[7] C. Kim and R. M. Stern, “Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring”, in Proc. ICASSP, pp. 4574–4577, 2010.

[8] C. Charbuillet, B. Gas, M. Chetouani and J. L. Zarader, "Complementary features for speaker verification based on genetic algorithms," IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4 2007 pp. IV-285 - IV-288.

[9] D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, “Feature learning in deep neural networks - studies on speech recognition tasks,” in Proc. Int. Conf. Learn. Represent., 2013.

[10]Diederik P. Kingma and Jimmy Lei Ba “A METHOD FOR
STOCHASTIC OPTIMIZATION” ICLR 2015.

[11] D.C.Cire¸san, U. Meier, J. Masci, L.M. Gambardella, and J. Schmidhuber. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.

[12] D. Cire¸san, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.

[13] E. Bocchieri and D. Dimitriadis “Investigating deep neural network based transforms of robust audio features for LVCSR” in Proc. ICASSP, pp. 6709–6713, 2013.

[14] F. Seide, G. Li, X. Chen, and D. Yu, “Feature engineering in context-dependent deep neural networks for conversational speech transcription,” in Proc. IEEE Workshop Autom. Speech Recognition Understand. (ASRU), 2011, pp. 24–29.

[15] F. Seide, G. Li, and D. Yu, “Conversational speech transcription
using context-dependent deep neural networks,” in Proc. Interspeech, 2011, pp. 437–440.

[16] H. Lee, P. Pham, Y. Largman, and A. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” in Proc. Adv. Neural Inf. Process. Syst. 22, 2009, pp. 1096–1104.

[17] H. Franco, M. Graciarena, and A. Mandal, “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition”, Proc. ICASSP 2012, pp. 4117-4120, March 2012

[18] J. Chen , K. K. Paliwal, M. Mizumachi and S. Nakamura, “Robust mfccs derived from differentiated power spectrum” Eurospeech 2001, Scandinavia, 2001.

[19] J.C.Wang,J.F.Wang,Y.S.Weng, “Chip design of MFCC extraction for speech recognition Volume 32 ,“ Issues 1–2, pp. 111-131, November 2002.

[20] L. Muda, M. Begam and I. Elamvazuthi, Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping(DTW) Techniques, Journal of Computing, 3(2),2010

[21] L. Deng, K. Hassanein, and M. Elmasry, “Analysis of correlation structure for a neural predictive model with applications to speech recognition,” Neural Netw., vol. 7, no. 2, pp. 331–339, 1994.

[22]L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1060–1089, May 2013.

[23]M.A.Anusuya and S.K.Katti, “Speech Recognition by Machine: A Review”, (IJCSIS) International Journal of Computer Science and Information Security, vol. 6, no. 3, pp. 181-205, 2009.

[24]M. Kleinschmidt, “Localized spectro-temporal features for automatic speech recognition,” in Proc. of Eurospeech, 2003, Sep 2003, pp. 2573–2576.

[25]N. Morgan, “Deep and wide: Multiple layers in automatic speech recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 7–13, Jan. 2012.

[26]Ossama Abdel-Hamid, Li Deng and Dong Yu, “Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition, “ Interspeech, pp. 3366-3370, August 2013.

[27]Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition, “IEEE/ACM Transaction On Audio, Speech, and Language Processing, Vol. 22, No. 10, October 2014.

[28]Ovidiu Buza1, Gavril Toderean1, Alina Nica1, Alexandru Caruntu1, “Voice Signal Processing For Speech Synthesis,” IEEE International Conference on Automation, Quality and Testing Robotics, Vol. 2, pp. 360-364, 25-28 May 2006.

[29]Parwinder Pal Singh and Pushpa Rani, “An Approach to Extract Feature using MFCC,” International organization of Scientific Research, Volume .04,pp.21-25, August 2014.

[30]P. C. Woodland and D. Povey, “Large scale discriminative training of
hidden Markov models for speech recognition,” Computer Speech
and Language, vol. 16, no. 1, pp. 25–47, 2002.

[31]Q. Zhu, B. Chen, N. Morgan, and A. Stolcke, “Tandem connectionist feature extraction for conversational speech recognition,” in Machine Learning for Multimodal Interaction. Berlin/Heidelberg, Germany: Springer , 2005, vol. 3361, pp. 223–231.

[32]Rajesh Kumar Aggarwal and M. Dave, “Acoustic modeling problem for automatic speech recognition system: advances and refinements Part (Part II)”, Int J Speech Technol, pp. 309– 320, 2011.

[33]Shuo-Yiin Chang and Nelson Morgan, “Robust CNN-based Speech Recognition With Gabor Filter Kernels, “ Interspeech, pp. 905-909, September 2014.

[34]Sheeraz Memon, Margaret Lech and Ling He, "Using information theoretic vector quantization for inverted mfcc based speaker verification," 2nd International Conference on Computer, Control and Communication, 2009. IC4 2009, pp. 1 – 5.

[35]S. Witt and S. Young, “Phone-level pronunciation scoring and
assessment for interactive language learning,” Speech
Communication, vol. 30, no. 2–3, pp. 95–108, 2000.

[36]S. Dhingra, G. Nijhawan and P. Pandit, Isolated Speech Recognition using MFCC and DTW, International journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2013.

[37] S. Chakroborty and S. Goutam, “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.

[38]S.Y. Chang, N. Morgan “Informative spectro-temporal bottleneck features for noise-robust speech recognition”, Proc. Interspeech 2013

[39]T. Landauer, C. Kamm, and S. Singhal, “Learning a minimally structured back propagation network to recognize speech,” in Proc. 9th Annu. Conf. Cogn. Sci. Soc., 1987, pp. 531–536.

[40]W. Han, C. F. Chan, C. S. Choy and K. P. Pun, “An Efficient MFCC
Extraction Method in Speech Recognition,” International Symposium on Circuits and Systems, pp. 21-24, 2006.

[41]Wang Chen, Miao Zhenjiang and Meng Xiao, "Comparison of different implementations of mfcc," J. Computer Science & Technology, 2001, pp. 16(16): 582-589.

[42]Wang Chen, Miao Zhenjiang and Meng Xiao, "Differential mfcc and vector quantization used for real-time speaker recognition system," Congress on Image and Signal Processing, 2008, pp. 319 - 323.

[43]Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time-series,” in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge, MA, USA: MIT Press, 1995.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	以類神經網路為架構之語音辨識系統
2.	基於深度學習之音樂片段人聲辨識
3.	改善類神經網路聲學模型經由結合多任務學習與整體學習於會議語音辨識之研究
4.	以深度學習進行股價預測之研究
5.	應用MLP、RBF及DNN類神經網路方法於中文母音辨識
6.	使用詞向量表示與概念資訊於中文大詞彙連續語音辨識之語言模型調適
7.	人工智慧語音辨識之研究
8.	依救護車鳴笛聲判斷其靠近或遠離之方法研究
9.	深度學習應用於違約風險之預測 — 以Lending Club為例
10.	以專利分析探討深度學習之應用與發展趨勢
11.	人工智慧於雨量預測及測謊之應用
12.	利用深度學習擷取胃內視鏡窄頻影像中之微血管及微結構之研究
13.	使用多模型合併之深度學習應用於音樂片段人聲辨識
14.	基於深度學習之人聲辨識探討資料集組成與測試準確率之關聯性
15.	語言障礙者之中文語音辨識

無相關期刊

1.	基於卷積與孿生神經網路之語者辨識系統
2.	基於自適應慣性權重改良之跳躍式粒子群演算法
3.	基於卷積神經網路之雙模型技術應用於語音辨識系統
4.	以CNN為基礎之語音辨識系統及應用於兩輪平衡車的控制
5.	多核心架構下可調式視訊編碼器之設計與實作
6.	以類神經網路為架構之語音辨識系統
7.	使用長短期記憶類神經網路建構中文語音辨識器之研究
8.	基於卷積神經網路及色彩影像技術之火焰辨識
9.	利用支持向量迴歸建構超高壓架空鐵塔基礎費用預測模型
10.	改良式粒子群神經網路應用於空氣品質之研究
11.	基於深度神經網路的手勢辨識研究
12.	公共工程的預防式工程界面管理之研究以捷運計畫施工界面的建構為例
13.	遞歸神經網絡在語音辨識上之表現分析
14.	課室小組討論活動的口語參與-以六名大學師資生為例
15.	用於繪製風力發電控制邏輯之特定領域語言

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室