跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.171) 您好!臺灣時間:2026/04/10 02:07
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:廖璽元
研究生(外文):Hsi-Yuan Liao
論文名稱:雜訊強健技術運用於不同種類之倒頻譜特徵於語音辨識之效能探究
論文名稱(外文):Evaluation of noise-robustness techniques on various types of cepstral features for speech recognition
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
口試委員:林容杉陳柏琳
口試委員(外文):Jung-Shan LinBer-lin Chen
口試日期:2013-06-14
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:68
中文關鍵詞:倒頻譜係數特徵強健性雜訊環境
外文關鍵詞:noise robustnessspeech recognitiongammatone filtermel filterspeech features
相關次數:
  • 被引用被引用:0
  • 點閱點閱:409
  • 評分評分:
  • 下載下載:26
  • 收藏至我的研究室書目清單書目收藏:0
在本論文中,我們主要分析介紹應用於語音辨識中之三種語音特徵表示法:梅爾倒頻譜特徵係數(MFCC)、扭曲型傅立葉轉換之倒頻譜係數(WDFTCC)及珈瑪調倒頻譜係數(GTCC),並凸顯它們之間的差異,另外,我們將三種語音特徵配合數種特徵強健性技術,觀察它們於乾淨環境與雜訊環境下所得到的辨識率,藉此評估比較三種語音特徵其鑑別力、抗噪性及與它們跟強健演算法搭配的加成性,從Aurora-2數字資料庫的實驗評估結果來看,在不使用強健演算法時,傳統的MFCC表現略優於WDFTCC與GTCC,然而當配合強健演算法使用時,WDFTCC與GTCC這兩種新型特徵相較於MFCC能達到更佳的辨識效果。
In this thesis, we first introduce three types of speech feature representation, including mel-frequency cepstral coefficients (MFCC), warped discrete Fourier transform cepstral coefficients (WDFTCC) and gammatone frequency cepstral coefficients (GTCC). These feature representations primarily differ in the applied filter-bank. Then we perform several noise robustness techniques on the above three types of features to evaluate the corresponding additivity in improving the recognition accuracy under a wide range of noise-corrupted environments provided in the Aurora-2 connected digit database. Experimental results reveal that, in the case of no noise robustness processing, MFCC performs slightly better than WDFTCC and GTCC, while GTCC and WDFTCC outperform MFCC when all of them are enhanced by any of the noise robustness approaches in advance.
摘要.....................................................I
Abstract.................................................II
目錄.....................................................III
圖目錄...................................................V
表目錄...................................................VII

目錄
第一章 緒論
1.1 研究動機..............................................1
1.2 研究方法簡介..........................................2
1.3 章節內容大綱..........................................5

第二章 一般語音特徵擷取方式之介紹
2.1 語音特徵的共通特性....................................6
2.2三類語音特徵求取流程及特性描述.........................7

第三章 各類語音特徵擷取之介紹
3.1在MFCC中使用的梅爾尺度濾波器組(mel-scale filterbank)...16
3.2 在WDFTCC中所使用的全通濾波器轉換之濾波器組(all-pass transformedfilterbank)....................................19
3.3 在GTCC中所使用的珈瑪調濾波器組(gammatone filterbank)..22

第四章 改進特徵之雜訊強健性技術的介紹
4.1倒頻譜平均值與變異數正規化法(Cepstral mean and variance normalization, CMVN).....................................26
4.2倒頻譜增益正規化法(Cepstral gain normalization, CGN)..29
4.3倒頻譜統計圖正規化法(Cepstral histogram normalization, CHN)......................................................32
4.4微幅功率提升法(small power boosting, SPB)..............35
4.5微幅功率削減法(small power reduction, SPR).............39

第五章 語音處理濾波器實驗環境設定與實驗結果
5.1 語音資料庫設定........................................43
5.2 語音特徵與語音聲學模型之介紹.............................45
5.3 辨識效能之評估.........................................46
5.4三種語音特徵之基礎實驗結果比較.............................47
5.5經強健性演算法處理後之三種語音特徵其實驗結果比較...............50
5.6綜合比較與討論..........................................63

第六章 結論與未來展望
參考文獻..................................................65

圖目錄
圖1.1 乾淨語音經過加成性及摺積性雜訊干擾示意圖.............2
圖2.1 各種特徵擷取之流程圖................................14
圖3.1 語音特徵擷取的流程圖................................15
圖3.2 線性頻率轉換成梅爾頻率圖............................16
圖3.3 梅爾濾波器(mel-filter)頻率響應圖....................18
圖3.4 all-pass filter, β=0...............................19
圖3.5 all-pass filter, β=0.56............................20
圖3.6 all-pass filter, β=-0.56...........................20
圖3.7 WDFTCC使用之filter-bank其頻率響應圖.................21
圖3.8 珈瑪調濾波器(gammatone filter)的脈衝響應圖..........22
圖3.9 珈瑪調濾波器頻率(強度)響應圖........................23
圖3.10 珈瑪調濾波器(gammatone filter)頻率響應圖...........24
圖4.1 乾淨語句及對應之雜訊語句的MFCC c1特徵時間序列.......27
圖4.2 乾淨語句及對應之雜訊語句的MFCC c1特徵經CMVN處理後的時間序列....................................................27
圖4.3乾淨語句及對應之雜訊語句的MFCC c1特徵時間序列........30
圖4.4乾淨語句及對應之雜訊語句的MFCC c1特徵經CGN處理後的時間序列......................................................30
圖4.5乾淨語句及對應之雜訊語句的MFCC c1特徵時間序列........33
圖4.6乾淨語句及對應之雜訊語句的MFCC c1特徵經CHN處理後的時間序列......................................................33
圖4.7乾淨語句及對應之雜訊語句的梅爾頻譜對數功率...........37
圖4.8乾淨語句及對應之雜訊語句的梅爾頻譜經SPB處理後的頻譜功率........................................................37
圖4.9乾淨語句及對應之雜訊語句的梅爾頻譜對數功率...........41
圖4.10乾淨語句及對應之雜訊語句的梅爾頻譜經SPR處理後的頻譜功率........................................................41
圖5.1 由左向右之連續密度隱藏式馬可夫模型(HMM)的示意圖.....45
圖5.2 不同雜訊種類的平均辨識率(%).........................48
圖5.3不同雜訊種類的平均辨識率(%)..........................52
圖5.4不同雜訊種類的平均辨識率(%)..........................54
圖5.5不同雜訊種類的平均辨識率(%)..........................57
圖5.6不同雜訊種類的平均辨識率(%)..........................60
圖5.7不同雜訊種類的平均辨識率(%)..........................62
圖5.8不同正規化法種類的平均辨識率(%)......................64

表目錄
表4.1 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......28
表4.2 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......31
表4.3 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......34
表4.4 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......38
表4.5 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......42
表5.1 Aurora-2.0語料庫相關資訊............................44
表5.2 MFCC基礎實驗之各測試組在不同訊雜比同一組別的平均辨識率(%).......................................................47
表5.3 WDFTCC基礎實驗之各測試組在不同訊雜比同一組別的平均辨識率(%).....................................................47
表5.4 GTCC基礎實驗之各測試組在不同訊雜比同一組別的平均辨識率(%).......................................................48
表5.5 CMVN作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...50
表5.6 CMVN作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).51
表5.7 CMVN作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...51
表5.8 CGN作用於MFCC之不同訊雜比同一組別的平均辨識率(%)....53
表5.9 CGN作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%)..53
表5.10 CGN作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...54
表5.11 CHN作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...56
表5.12 CHN作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).56
表5.13 CHN作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...57
表5.14 SPB作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...58
表5.15 SPB作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).59
表5.16 SPB作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...59
表5.17 SPR作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...61
表5.18 SPR作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).61
表5.19 SPR作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...62
[1]王小川, “語音訊號處理,” 全華科技圖書, 2004.
[2]S.Tarar, “Speech Analysis: Desktop Items Activation Using Dynamic Time Warping,” IEEE International Conference on Computer Science and Information Technology, 6, pp. 657 - 659, 2010.
[3]T. Li, W. Xu, J. Pan, Y. Yan, “Improving Automatic Speech Recognizer of Voice Search using System Combination,” Fuzzy Systems and Knowledge Discovery, 4, pp. 477 - 480, 2009.
[4]S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics Speech and Signal Processing, 27(2), pp. 113-120, 1979.
[5]C. Plapous, C. Marro and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement” IEEE Transactions on Acoustics Speech and Signal Processing, 14(6), pp. 2098-2108, 2006.
[6]P. J. Moreno, B. Raj, and R. M. Stern, “A vector Taylor series approach for environment-independent speech recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 2, pp. 733-736, 1996.
[7]A. Davis, S. Y. Low, S. Nordholm, “A Subband Space Constrained Beamformer Incorporating VoiceActivity Detection,” in Proceedings IEEE International Conference on Acoustics Speech and Signal Processing, 3, pp. iii/65 - iii/68, 2005.
[8]C. W. Hsu, L. S. Lee, “Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition,” IEEE Transactions on Audio Speech and Language Processing, 17(2), pp. 205 - 220, 2009.
[9]H. Hermansky, N. Morgan, “RASTA processing of speech,” IEEE Transactions on Speech and Audio Processing, 2(4), pp. 578 - 589, 1994.
[10]X. Huang, A. Acero and H. W. Hon, “Spoken language processing: a guide to theory, algorithm and system development,” Prentice Hall PTR, 2001.
[11]R. Muralishankar,D. O'Shaughnessy, “A Comparative Analysis of Noise Robust Speech Features Extracted from All-pass based Warping with MFCC in a Noisy Phoneme Recognition,” The Third International Conference on Digital Telecommunications, pp. 180 - 185, 2008.
[12]R. Schlider, I. Bezrukov, H. Wagner, H. Ney, “Gammatone features and feature combination for large vocabulary Speech recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 4, pp. IV-649 - IV-652, 2007.
[13]M. Zbancioc, M. Costin, “Using neural networks and LPCC to improve speech recognition,” in Proceedings of International Symposium on Signals Circuits and Systems, pp. 445 - 448, 2003.
[14]Y. Peng,L. Mu,X. Kong, Z. Lin, L. Wang, “A Study On Echo Feature Extraction Based On The Modified Relative Spectra(RASTA) and Perception Linear Prediction(PLP) Auditory Model,” in Proceedings of IEEE International Conference on Intelligent Computing and Intelligent Systems, 2, pp. 657 - 661, 2010.
[15]R. Muralishankar, A. Sangwan and D. O'Shaughnessy, “Warped discrete cosine transform cepstrum: a new feature for Speech processing,” in Proceedings of IEEE Workshop on Digital Object , pp.99 - 104, 2005.
[16]S. Tiberewala and H. Hermansky, “Multiband and adaptation approaches to robust speech recognition,” in Proceedings of European Conference on Speech Communication and Technology, 25(1-3), pp. 2619-2622, 1997.
[17]S. Yoshizawa, N. Hayasaka, N. Wada and Y. Miyanaga, “Cepstral gain normalization for noise robust speech recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 1, pp. I-209-212, 2004.
[18]F. Hilger and H. Ney, “Quantile based histogram equalization for noise robust large vocabulary speech recognition,” IEEE Transactions on Audio Speech and Language Processing, 14(3), pp. 845–854, 2006.
[19]C. Kim, K. Kumar, and R. M. Stern, “Robust Speech Recognition using a Small Power Boosting Algorithm,” in Proceedings of IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 243 - 248, 2009.
[20]J. S. Lin, I. C. Liu, and J. W. Hung, “Small Power Reduction Technique in Noise-Robust Speech Recognition, ” submitted to IEICE transactions on Information and Systems, 2013
[21]H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” in Proceedings of the Automatic Speech Recognition: Challenges for the new Millenium, pp. 181-188, 2000.
[22]S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P. Woodland, “The HTK Book (for HTK Version 3.4),” Cambridge University Engineering Department, Cambridge, UK, 2006.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top