論文名稱(外文):The Research of Hidden Markov Model Applied on Mandarin Recognition
外文關鍵詞:digital signal processingHMMspeaker-adaptiveBayesian adaptation
語者調適(Speaker-adaptive)辨識系統可藉由輸入新語者少量語音資料進行訓練,而可達到接近特定語者系統的辨識率。本論文研究利用數位訊號處理技術取得語音訊號之特徵參數,以連續型隱藏式馬可夫模型(Continuous Density Hidden Markov Model : CDHMM)為主的語音辨識演算法,並嘗試以貝氏調適(Bayesian-adaptation)技術應用於非特定語者調適系統的訓練上。
特徵參數選取方面,以梅爾(mel)頻譜係數為主,應用連續型隱藏式馬可夫模型,分別建立各個語音的模型。辨識方面是採用維特比演算法求出最佳之機率結果。語者調適方面,選取適當的已知模型為基礎參考模型,將新語者的語音樣本以貝氏調適的作法結合在K路分割(Segmental k-means)訓練演算法進行調適,將可提高辨識系統對新語者之辨識率。
此系統於個人電腦上的視窗Windows 98作業系統下進行測試與實驗,建立一確實可行之語音辨識系統。
The functions of the computer and network are more prevalently, so many jobs can be finished easily by the computer. Using speech as computer input, the users need not memorize any rules to use the computer. It''''s more friendly for those people who doesn''''t experienced in computer, so they can communicate with computer easily, therefore speech signal processing becomes an important topic of research.
A speaker-adaptive system makes use of existing knowledge contained in a reliable trained reference system, so that a small amount of training data is sufficient to reach performance of the speaker-dependent system. So, in this thesis, speech feature parameter extraction by using digital signal processing technology was studied. The CDHMM was used as the basis of speech recognition system, and the Bayesian adaptation technique was used in a speaker-independent system.
During the feature parameter extraction stage, the mel spectrum was used to evaluate feature parameter of the speech. Using the CDHMM established all models of Mandarin. Viterbi algorithm is a recognition procedure to find the best result of probability in HMM. In the speaker-adaptive stage, the suitable determinate model was chosen as the basic referential model of the system, and the Bayesian training algorithm was integrated into the segmental k-means training procedure, it will promote the performance of the speaker-adaptive system when a new speaker uses it.
Finally, an actually realized speech recognition system was established after training and testing under windows 98 operating system environment.
第一章 緒論....................................1
第二章 語音訊號辨識與調適的理論基礎............5
2.1 語音訊號的前置處理..........................5
2.1.1 能量量測..................................5
2.1.2 越零率....................................7
2.1.3 語音信號切割..............................8
2.2 語音訊號的特徵及特徵參數抽取...............10
2.2.1 梅爾(mel)倒頻譜參數......................11
2.3 隱藏式馬可夫模型...........................15
2.4 隱藏式馬可夫模型之建立.....................16
2.4.1 機率計算.................................18
2.4.2 正算程序(Forward Procedure)..............19
2.4.3 逆算程序(Backward Procedure).............21
2.5 維特比演算法(Viterbi Algorithm)............23
2.6 參數重估(Parameter Reestimation)...........27
2.7 語者調適...................................30
2.7.1 貝氏調適法(Bayesian Adapation)...........30
第三章 語音訊號辨識與調適系統的建立...........34
3.1.1 梅爾頻譜.................................34
第四章 實驗結果與比較.........................40
4.1 語音樣本數對辨識系統的影響.................40
4.2 隱藏式馬可夫模型的狀態數對辨識系統的影響...45
4.3 語者調適系統...............................47
第五章 結論與未來展望.........................49
5.1 結論.......................................49
5.2 未來展望...................................51
