跳到主要內容

臺灣博碩士論文加值系統

(44.192.95.161) 您好!臺灣時間:2024/10/16 04:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳光杰
研究生(外文):Kuang-chieh Wu
論文名稱:加成性雜訊環境下倒頻譜統計正規化法於強健性語音辨識之研究
論文名稱(外文):Study of Cepstral Statistics Normalization Techniques for Robust Speech Recognition in Additive Noise Environments
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:81
中文關鍵詞:自動語音辨識碼簿強健性語音特徵
外文關鍵詞:automatic speech recognitionfeature normalization approachesnoise robustness
相關次數:
  • 被引用被引用:1
  • 點閱點閱:249
  • 評分評分:
  • 下載下載:34
  • 收藏至我的研究室書目清單書目收藏:0
一套自動語音辨識系統,在雜訊環境下其辨識效果通常會受到明顯影響,該如何有效地克服這樣的問題,一直以來都是此領域研究的重點,本論文即是針對此問題加以研究,而提出幾種改進技術。在過去的研究中,有一系列的改進技術,是藉由正規化語音特徵的統計特性來降低雜訊的影響,例如:倒頻譜平均消去法、倒頻譜平均值與變異數正規化法與統計圖等化法等,這些方法被證明皆有明顯的效能,可以有效提升語音特徵在雜訊環境下的強健性。本論文即是以這三種倒頻譜特徵參數正規化技術為背景,發展一系列改進之強健性方法。

前面所提到的三種特徵參數正規化技術中所須用到的特徵統計值,通常是由整段的語句或片段的語句所包含的特徵求得,而在過去本實驗室的研究中,曾運用以碼簿(codebook)為基礎的方式來求取這些統計值,發現相對於之前的作法能有明顯進步。在本論文第一部分,我們提出一改良式的碼簿建構程序,其中使用語音偵測(voice activity detection, VAD) 技術來分隔訊號中的語音成分與非語音成分,然後利用語音部分的特徵來建構碼簿,同時對所建立之碼簿中的每個碼字(codeword)賦予權重(weight),此程序所建構的碼簿,經實驗證實,可以提升原始碼簿式(codebook-based)特徵參數正規化法的效能。而在第二部份,我們則是整合上述之碼簿式(codebook-based)與整段式(utterance-based)兩類方法所得到之特徵統計資訊,發展出所謂的組合式(associative)特徵參數正規化法。此類組合式的新方法相較於整段式與碼簿式的方法,能得到更好的效果,更有效地提升加成性雜訊環境下語音的辨識精確度。
The noise robustness property for an automatic speech recognition system is one of the most important factors to determine its recognition accuracy under a noise-corrupted environment. Among the various approaches, normalizing the statistical quantities of speech features is a very promising direction to create more noise-robust features. The related feature normalization approaches include cepsral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), histogram equalization (HEQ), etc. In addition, the statistical quantities used in these techniques can be obtained in an utterance-wise manner or a codebook-wise manner. It has been shown that in most cases, the latter behaves better than the former.
In this thesis, we mainly focus on two issues. First, we develop a new procedure for developing the pseudo-stereo codebook, which is used in the codebook-based feature normalization approaches. The resulting new codebook is shown to provide a better estimate for the features statistics in order to enhance the performance of the codebook-based approaches. Second, we propose a series of new feature normalization approaches, including associative CMS (A-CMS), associative CMVN (A-CMVN) and associative HEQ (A-HEQ). In these approaches, two sources of statistic information for the features, the one from the utterance and the other from the codebook, are properly integrated. Experimental results show that these new feature normalization approaches perform significantly better than the conventional utterance-based and codebook-based ones. As the result, the proposed methods in this thesis effectively improve the noise robustness of speech features.
誌謝……………………………………………………………………………………....i
摘要……………………………………………………………………………………..iii
Abstract…………………………………………………………………………………..v
目錄……………………………………………………………………………………..vii
圖目錄…………………………………………………………………………………...ix
表目錄…………………………………………………………………………………..xii
第一章 緒論……………………………………………………………………………..1
1.1 研究動機…………………………………………………………………………….1
1.2 強健性語音辨識方法簡介…………………………………………………………3
1.3 研究方法簡介……………………………………………………………………....4
1.4 論文架構…………………………………………………………………………….6
第二章 語音訊號特徵參數與聲學模型之介紹………………………………………7
2.1 語音特徵參數之擷取………………………………………………………………7
2.2 聲學模型之建立…………………………………………………………………..13
第三章 整段式強健性語音特徵參數正規化技術…………………………………..15
3.1 整段式倒頻譜平均消去法(utterance-based cepstral mean subtraction, U-CMS)…………………………………………………………………………………16
3.2 整段式倒頻譜平均值與變異數正規化法(utterance-based cepstral mean and variance normalization, U-CMVN)……………………………………………………16
3.3 整段式倒頻譜統計圖等化法(utterance-based cepsteral histogram equalization, U-HEQ)………………………………………………………………………17
第四章 碼簿式特徵參數正規化技術………………………………………………..19
4.1 虛擬雙通道碼簿之建立方式………………………………...…………………..20
4.2 碼簿式特徵參數正規化技術………………………………...…………………..24
4.3 本章結論…………………………………………………………………………..26
第五章 組合式特徵參數正規化技術………………………………………………..28
5.1 組合式倒頻譜平均消去法(associative CMS, A-CMS)與組合式倒頻譜平均值與變異數正規化法(associative CMVN, A-CMVN)…………………………………….28
5.2 組合式倒頻譜統計圖等化法(associative HEQ, A-HEQ)……………………….30
5.3 本章結論…………………………………………………………………………..31
第六章 辨識實驗結果與相關討論…………………………………………………..33
6.1 語音資料庫簡介…………………………………………………………………..33
6.2 辨識效能評估……………………………………………………………………..39
6.3 基本實驗(原始MFCC特徵)的訓練與辨識結果……………………………….39
6.4 整段式特徵正規化技術辨識結果……………………………………………….41
6.5 碼簿式特徵正規化技術辨識結果……………………………………………….45
6.6 組合式的特徵參數正規化法之辨識結果……………………………………….59
第七章 結論與未來展望……………………………………………………………...77
參考文獻………………………………………………………………………………..79
[1] 王小川, "語音訊號處理", 全華科技圖書, 2004
[2] Y. Gong, "Speech Recognition in Noisy Environments:A Survey", Speech Communication 16, 1995
[3] M. J. F Gales, "Model-based Technique for Noise Robust Speech Recognition", University of Cambridge.
[4] S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction", IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 27, NO. 2, pp.113-120, 1979
[5] P. Lockwood and J. Boudy, "Experiments with a Nonlinear Spectral Subtraction (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars", 1991 European Conference on Speech Communication and Technology (Interspeech 1991—Eurospeech).
[6] A. D. Berstein and I. D. Shallom, "An Hypothesized Wiener Filtering Approach to Noisy Speech Recognition", 1991 International Conference on Acoustics, Speech and Signal Processing (ICASSP 1991), pp.913-916.
[7] Tai-Hwei Hwang, "Energy Contour Extraction for In-Car Speech Recognition", 2003 European Conference on Speech Communication and Technology (Interspeech 2003—Eurospeech), pp.2181-2184.
[8] Tai-Hwei Hwang and Sen-Chia Chang, "Energy Contour Enhancement for Noisy Speech Recognition", 2004 International Conference on Spoken Language Processing (Interspeech 2004—ICSLP), pp.249-252.
[9] S. Furui, "Cepstral Analysis Technique for Automatic Speaker Verification", IEEE Trans. on Acoustics, Speech and Signal Processing, 1981
[10] S. Tiberewala and H. Hermansky, "Multiband and Adaptation Approaches to Robust Speech Recognition", 1997 European Conference on Speech Communication and Technology (Eurospeech 1997), pp. 107-110.
[11] H. Hermansky and N. Morgan, "RASTA Processing of Speech", IEEE Trans. on Speech and Audio Processing, pp. 578-589, 1994
[12] J. L. Gauiain and C. H. Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains", IEEE Trans. on Speech and Audio Processing, 1994
[13] J-W. Hung, J-L. Shen, L-S. Lee, "New Approaches for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques", IEEE Trans. on Speech and Audio Processing, Nov 2001
[14] C. J. Leggetter and P. C. Woodland, "Maxumum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models", Computer Speech and Language, 1995
[15] ITU-T Recommendation G.729-Annex B: A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70
[16] A. Torre, J. Segura, C. Benitez, A. M. Peinado, and A. J. Rubio, "Non-Linear Transformations of the Feature Space for Robust Speech Recognition", 2002 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2002), pp. 401-404.
[17] 謝宗學, "Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments" , 國立暨南國際大學碩士論文, June 2007
[18] Tsung-hsueh Hsieh and Jeih-weih Hung, "Speech Feature Compensation Based on Pseudo Stereo Codebooks for Robust Speech Recognition in Additive Noise Environments", 2007 European Conference on Speech Communication and Technology (Interspeech 2007—Eurospeech), pp. 242-245.
[19] Benjamin J. Shannon, Kuldip K. Paliwal, "Feature Extraction from Higher-lag Autocorrelation Coefficients for Robust Speech Recognition", Speech Communication, 2006
[20] B. S. Atal, "Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification", Journal of the Acoustical Society of America, vol. 55, pp. 1304-1312, 1974.
[21] J. Makhoul, "Spectral Linear Prediction: Properties and Applications," IEEE Trans. on Acoustics, Speech and Signal Processing, 1975
[22] H. Hermansky, "Perceptual linear Predictive (PLP) Analysis of Speech", Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, Apr. 1990
[23] C-F. Tai and J-W. Hung, "Silence Energy Normalization for Robust Speech Recognition in Additive Noise Environments", 2006 International Conference on Spoken Language Processing (Interspeech 2006—ICSLP), pp. 2558-2561.
[24] H. G. Hirsch and D. Pearce, "The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions", Proceedings of ISCA IIWR ASR2000, Paris, France, 2000
[25] http://htk.eng.cam.ac.uk/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top