跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.131) 您好!臺灣時間:2026/01/16 02:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳思予
研究生(外文):Wu, Szyu
論文名稱:語音特徵倒頻帶塑型與正規化於強健性語音辨識 之研究
論文名稱(外文):The study of speech feature shaping and normalization in quefrency bands for noise-robust speech recognition
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Hung, Jeihweih
口試委員:林嘉慶陳柏琳林容杉
口試委員(外文):Lin, JiachinChen, BerlinLin, Jungshan
口試日期:2012-07-25
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:47
中文關鍵詞:倒頻譜特徵分頻統計圖等化法語音辨識
外文關鍵詞:sub-band divisionspeech recognitionrobust speech featurehistogram equalization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:295
  • 評分評分:
  • 下載下載:57
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出了一種語音辨識中強化特徵雜訊強健性的新技術,來改進雜訊環境下語音辨識的效能。此技術名為加權式子頻帶階層統計圖等化法(weighted sub-band level histogram equalization, WS-HEQ),此方法主要參考了近年來所新提出的子頻帶階層統計圖等化法(sub-band level histogram equalization, S-HEQ),對其強健性效能與執行效率加以提升。在所新提出的WS-HEQ法中,我們特別考慮了音框內倒頻譜特徵之高頻成分與低頻成分非等量重要的資訊,進而對高頻成分加以適度的抑制,配合統計圖等化法的處理,可使語音特徵所受的雜訊效應得到比S-HEQ更明顯的降低。我們提出了四種WS-HEQ的變型,它們不同之處在於使用HEQ次數的多寡與濾波器的形式,而其中三種WS-HEQ的運算複雜度明顯低於S-HEQ。在國際通用的語音資料庫Aurora-2上,我們驗證了所提出之WS-HEQ法能夠大幅提昇各種雜訊環境下語音辨識的精確度,同時,四種WS-HEQ的變型其辨識率都明顯高於原始HEQ,且它們在大多數情形下也能比S-HEQ得到更佳的辨識表現。


In this study, we develop a novel noise-robustness method, termed weighted sub-band level histogram equalization (WS-HEQ), to promote the speech recognition accuracy in a noise-corrupted environment. Based on the observation that the high-pass and low-pass portions of the intra-frame cepstral features possess unequal importance for speech recognition and different signal-to-noise ratios (SNRs), WS-HEQ intends to alleviate the high-pass portion in order to highlight the speech components and reduce the effect of noise. Furthermore, we provide four variants of WS-HEQ, which primarily refer to the structure of sub-band level histogram equalization (S-HEQ).
In the experiments conducted on the Aurora-2 connected US digit database, we show that all the presented four variants of WS-HEQ give significant recognition improvements relative to the MFCC baseline in various noise-corrupted situations. WS-HEQ outperforms HEQ in recognition accuracy, and it behaves better than S-HEQ in most cases. Besides, WS-HEQ can be implemented more efficiently than S-HEQ since fewer HEQ processes are needed in WS-HEQ than S-HEQ.

目錄
誌謝 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 vii

第一章 簡介
1-1研究動機 1
1-2研究方向 2
1-3研究方法 3
1-4論文大綱 5

第二章 梅爾倒頻譜語音特徵擷取流程與強健性技術介紹
2-1梅爾倒頻譜語音特徵之擷取流程及原理介紹 6
2-2部分語音特徵時間序列強健性技術之介紹 9
2-2.1倒頻譜平均消去法(cepstral mean subtraction, CMS) 9
2-2.2倒頻譜平均與變異數正規化法(mean and variance normalization, MVN) 10
2-2.3倒頻譜增益正規化法(cepstral gain normalization, CGN) 10
2-2.4統計圖正規化法(histogram equalization, HEQ) 11
2-3特徵時間序列之頻譜處理相關之強健性技術介紹 12

第三章 加權式子頻帶統計圖等化法之介紹
3-1特徵子頻帶統計圖等化法(S-HEQ)之介紹 13
3-2新提出之加權式子頻帶層次統計圖等化法(WS-HEQ)之介紹 17
3-2.1原理說明 17
3-2.2執行步驟 20

第四章 實驗環境設置與基礎實驗之結果與討論
4-1實驗環境與基礎特徵之參數設定 24
4-2原始梅爾倒頻譜特徵參數所得辨識率(基礎實驗結果) 26

第五章 各種強健性語音特徵技術之實驗結果與討論
5-0子頻帶層次統計圖等化法(S-HEQ)之實驗結果 28
5-1加權式子頻帶層次統計圖等化法-I(WS-HEQ(I))之實驗結果 30
5-2加權式子頻帶層次統計圖等化法-II(WS-HEQ(II))之實驗結果 34
5-3加權式子頻帶層次統計圖等化法-III(WS-HEQ(III))之實驗結果 37
5-4加權式子頻帶層次統計圖等化法-IV(WS-HEQ(IV))之實驗結果 41

第六章 結論與未來展望 44

參考文獻 46



參考文獻

[1]王小川, “語音訊號處理’’, 全華科技圖書, 2004.
[2]S. Furui, “Cepstral analysis technique for automatic speaker verification”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, pp. 254-272, Apr. 1981.
[3]S. Tiberewala and H. Hermansky, “Multiband and adaptation approaches to robust speech recognition”, Eurospeech 1997.
[4]S. Yoshizawa, N. Hayasaka, N. Wada and Y. Miyanaga, “Cepstral gain normalization for noise robust speech recognition”, ICASSP 2004.
[5]F. Hilger and H. Ney, “Quantile based histogram equalization for noise robust large vocabulary speech recognition”, IEEE Trans. on Audio, Speech, and Language Processing, vol. 14, pp. 845-854, 2006.
[6]L. C. Sun, C. W. Hsu and L. S. Lee, “Modulation spectrum equalization for robust speech recognition”, ASRU 2007.
[7]J. W. Hung and H. T. Fan, “Subband feature statistics normalization techniques based on a discrete wavelet transform for robust speech recognition”, Signal Processing Letters, IEEE, vol. 16, pp. 806-809, 2009.
[8]V. Joshi, R. Bilgi, S. Umesh L.Garcia, C. Benitez, “Sub-band level histogram equalization for robust speech recognition”, Interspeech 2011.
[9]Davis, S. B. and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, pp.357-366, 1980.
[10]Sanjit K. Mitra, “Digital Signal Processing, a computer-based approach”, 3rd version, McGraw-Hill, 2005.

[11]C. P. Chen and J. A. Bilmes, “MVA processing of speech features”, IEEE Trans. on Audio, Speech, and Language Processing, vol. 15, pp. 257-270, 2007.
[12]N. Kanedera, T. Arai, H. Hermansky, and M. Pavel, “On the importance of various modulation frequencies for speech recognition”, Eurospeech 1997.
[13]H. Hermansky and N. Morgan, “RASTA processing of speech”, IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 578-589, 1994.
[14]J-W. Hung and L-S. Lee, “Optimization of temporal filters for constructing robust features in speech recognition”, IEEE Trans. on Audio, Speech and Language Processing, vol. 4, pp. 808-832, 2006.
[15]H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions”, ISCA IIWR ASR 2000
[16]http://htk.eng.cam.ac.uk/
[17]C.-W. Hsu and L.-S. Lee, “Higher order cepstral moment normalization (HOCMN) for robust speech recognition”, ICASSP 2004.



QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top