(3.80.6.131) 您好!臺灣時間:2021/05/17 03:34
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:黃勝源
研究生(外文):Sheng-yuan Huang
論文名稱:強健性語音辨識中分頻帶調變頻譜補償之研究
論文名稱(外文):The Study of Sub-band Modulation Spectrum Compensation for Robust Speech Recognition
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:65
中文關鍵詞:語音辨識調變頻譜強健性語音特徵參數
外文關鍵詞:speech recognitionmodulation spectrumrobust speech features
相關次數:
  • 被引用被引用:0
  • 點閱點閱:79
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:0
雖然語音科技進步迅速,但自動語音辨識仍是一門值得繼續研究開發的課題。因為目前多數的語音辨識系統應用於不受干擾的安靜環境,雖然能得到相當滿意的辨識效果,但若將其應用於實際的環境中,語音訊號往往會因為環境雜訊的影響,導致辨識效能有明顯地衰減,發展多年的強健性技術即是針對此項缺點作改進。
在諸多強健性技術中,有一類方法為對語音特徵作統計上的正規化,傳統上,這些方法都是對全頻段的語音特徵時間序列做正規化處理,然而,在分析此類方法的效能上,通常是以其調變頻譜的正規化程度作為效能的依據,因此,如果直接在語音特徵之調變頻譜上作正規化,應亦可達到不錯的效果。另外,由於不同頻率的調變頻率成份具有不相等的重要性,但是傳統之特徵時間序列正規化法相對忽略了此性質,基於這些觀察,在本論文中,我們提出了一系列的分頻段調變頻譜統計正規化法,此類方法可以分別正規化不同頻段的統計特性,進而提升語音特徵在雜訊環境下的強健性能;在國際通用的Aurora-2連續數字資料庫之語音辨識上,我們所提出的新方法相對於基礎實驗的辨識率而言,可以達到高達65%的相對錯誤降低率,而這些新的調變頻譜正規化法相對於時間序列正規化法而言,於相對錯誤降低率上也有7%至32%的進步空間,此足以驗證這些新方法能夠更有效地提昇語音辨識系統在雜訊環境下的辨識效能。
In this paper, we propose a novel scheme in performing feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first converted into the modulation spectral domain. The magnitude part of the modulation spectrum is decomposed into non-uniform sub-band segments, and then each sub-band segment is individually processed by the well-known normalization methods, like mean normalization (MN), mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band magnitude spectral segments and the original phase spectrum using the inverse DFT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band spectral MN, MVN and HEQ provide relative error rate reductions of 18.66% and 23.58% over the conventional temporal MVN and HEQ, respectively.
誌謝.................................................................I
摘要................................................................II
Abstract..........................................................III
目錄................................................................IV
圖目錄.............................................................VII
表目錄............................................................VIII

第一章 簡介
1-1 研究動機.........................................................1
1-2 研究方向.........................................................2
1-3 章節大綱.........................................................3

第二章 語音特徵參數擷取及傳統時間序列域上之特徵統計正規化法
2-1 語音特徵參數其擷取流程.............................................5
2-2 知名的語音強健性技術...............................................8
2-2-1 倒頻譜平均值正規化法(CMN)....................................8
2-2-2 倒頻譜平均值與變異數正規化法(CMVN)..............................9
2-2-3 倒頻譜平均值與變異數正規化結合自回歸動態平均濾波器法
(MVA).....................................................10
2-2-4 統計圖等化法(HEQ)..........................................11
2-2-5 調變頻譜統計圖等化法(SHE)...................................12

第三章 基於強度頻譜之分頻帶調變頻譜統計正規化法..........................14
第四章 辨識實驗設定及基礎實驗結果
4-1 語音資料庫簡介...................................................23
4-2 語音聲學模型建立.................................................25
4-3 語音特徵參數設定及基本實驗辨識結果..................................25

第五章 實驗結果及分析討論
5-1 傳統時間序列域上之特徵統計正規化法之實驗結果.........................28
5-1-1 倒頻譜平均值正規化法(CMN)...................................28
5-1-2 倒頻譜平均值與變異數正規化法(CMVN)...........................29
5-1-3 平均值與變異數正規化結合自回歸動態平均濾波器法(MVA)............30
5-1-4 統計圖等化法(HEQ)..........................................31
5-1-5 傳統時間序列域上之特徵統計正規化法結果綜合分析.................32
5-2 全頻段與分頻段調變頻譜正規化之實驗結果..............................33
5-2-1 調變頻譜平均值正規化法(SMN).................................33
5-2-2 調變頻譜平均值與變異數正規化法(SMVN).........................35
5-2-3 調變頻譜統計圖等化法(SHE)...................................38
5-2-4 調變頻譜正規化法綜合分析....................................41
5-3 調變頻譜正規化法結合時域上特徵正規化法之實驗結果.....................45
5-3-1 調變頻譜平均值正規化法結合時域上特徵正規化法之實驗結果..........46
5-3-2 調變頻譜平均值與變異數正規化法結合時域上特徵正規化法之實驗結果...50
5-3-3 調變頻譜統計圖等化法結合時域上特徵正規化法之實驗結果............55
5-4 本章結論........................................................59

第六章 結論與未來展望.................................................61

參考文獻............................................................62
[1] 王小川, "語音訊號處理", 全華科技圖書, 2004
[2] Yifan Gong, "Speech recognition in noisy environments:a survey", Speech Communication, Vol. 16, pp.261-291, 1995
[3] Mark John Francis Gales, "Model-based techniques for noise robust speech recognition", Ph.D. thesis, University of Cambridge, United Kingdom, Sep. 1995
[4] Steven B. Davis and Paul Mermelstein, ''Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences'', IEEE Trans. on Acoustics, Speech and Signal Processing, pp.357-366, 1980
[5] Shajith Ikbal, Hemant Misra and Herve Bourlard, ''Phase autocorrelation (PAC) derived robust speech features'', 2003 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003), pp.133-136
[6] Sangita Tibrewala and Hynek Hermansky, ''Multi-band and adaptation approaches to robust speech recognition'', 1997 European Conference on Speech Communication and Technology (Eurospeech 1997)
[7] Jeih-Weih Hung, Jia-Lin Shen and Lin-Shan Lee, ''New approaches for domain transformation and parameter combination for improved accuracy in parallel model combination (PMC) techniques'', IEEE Trans. on Speech and Audio Processing, pp.842-855, 2001
[8] Jean-Luc Gauvain and Chin-Hui Lee, ''Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chain'', IEEE Trans. on Speech and Audio Processing, pp.291-298, 1994
[9] C.J. Leggetter and P.C. Woodland, ''Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models'', Computer Speech and Language, Vol. 9, pp.171-186, 1995
[10] Sadaoki Furui, "Cepstral analysis technique for automatic speaker verification", IEEE Trans. on Acoustics, Speech and Signal Processing, pp.254-272, 1981
[11] Olli Viikki and Kari Laurila, ''Cepstral domain segmental feature vector normalization for noise robust speech recognition '', Speech Communication, Vol. 25, pp.133-147, 1998
[12] Olli Viikki and Kari Laurila, ''Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization'', ESCA NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-a-Mousson, France pp.107-110, 1997
[13] Chang-wen Hsu and Lin-shan Lee, ''Higher order cepstral moment normalization (HOCMN) for robust speech recognition'', 2004 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp.197-200
[14] Shingo Yoshizawa, Noboru Hayasaka, Naoya Wada and Yoshikazu Miyanaga, "Cepstral gain normalization for noise robust speech recognition", 2004 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp.1021-1024
[15] Jun Du and Ren-Hua Wang, ''Cepstral shape normalization (CSN) for robust speech recognition'', 2008 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp.4389-4392
[16] Florian Hilger and Hermann Ney, "Quantile based histogram equalization for noise robust large vocabulary speech recognition", IEEE Trans. on Audio, Speech and Language Processing, pp.845-854, 2006
[17] Ángel de la Torre, Antonio M. Peinado, José C. Segura, José L. Pérez-Córdoba, Ma Carmen Benítez, and Antonio J. Rubio, "Histogram equalization of speech representation for robust speech recognition", IEEE Trans. on Speech and Audio Processing, pp.355-366, 2005
[18] Liang-che Sun, Chang-wen Hsu and Lin-shan Lee, "Modulation spectrum equalization for robust speech recognition", in Proc. IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp.81-86, 2007
[19] Liang-che Sun, Chang-wen Hsu and Lin-shan Lee, "Evaluation of modulation spectrum equalization techniques for large vocabulary robust speech recognition", 2008 European Conference on Speech Communication and Technology (Interspeech 2008), pp.1004-1007
[20] Hynek Hermansky and Nelson Morgan, "RASTA processing of speech", IEEE Trans. on Speech and Audio Processing, pp.578-589, 1994
[21] Hynek Hermansky and Petr Fousek, "Multi-resolution RASTA filtering for TANDEM-based ASR", 2005 International Conference on Spoken Language Processing (Interspeech 2005), pp.361-364
[22] Noboru Kanedera, Takayuki Arai, Hynek Hermansky, and Misha Pavel, "On the importance of various modulation frequencies for speech recognition", 1997 European Conference on Speech Communication and Technology (Eurospeech 1997), pp. 1079-1082
[23] Chia-Ping Chen and Jeff A. Bilmes, "MVA processing of speech features", IEEE Trans. on Audio, Speech, and Language Processing, pp.257-270, 2006
[24] David Pearce and Hans-Günter Hirsch, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions", Proceedings of ISCA IIWR ASR2000, Paris, France, pp.181-188, 2000
[25] Sanjit K. Mitra, "Digital signal processing, a computer-based approach", 3rd version, McGraw-Hill, 2005
[26] Hynek Hermansky, "Should recognizers have ears?", Speech Communication, Vol. 25, pp.3-27, 1998
[27] Xiong Xiao, Eng Siong Chng and Haizhou Li, "Normalizaing the speech modulation spectrum for robust speech recognition", 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), pp.1021-1024
[28] Jeih-Weih Hung, ''Cepstral statistics compensation using online pseudo stereo codebooks for robust speech recognition in additive noise environments'', 2006 International Conference on Acoustics, Speech and Signal Processing (ICASSP 2006), pp.513-516
[29] Jeih-Weih Hung, ''Cepstral statistics compensation and normalization using online pseudo stereo codebooks for robust speech recognition in additive noise environments'', IEICE Trans. Information and Systems, pp.296-311, 2008
[30] ITU recommendation G.712, "Transmission performance characteristics of pulse code modulation channels", Nov. 1996
[31] Henry Stark, John W. Woods, "Probability and random processes with applications to signal processing", 3rd Edition, Prentice-Hall, 2002
[32] Jasha Droppo, Li Deng and Alex Acero, "Evaluation of SPLICE on the AURORA 2 and 3 tasks", 2002 International Conference on Spoken Language Processing (ICSLP 2002), pp.29-32
[33] The hidden Markov model toolkit (HTK): http://htk.eng.cam.ac.uk
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊