臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.171) 您好！臺灣時間：2026/04/10 02:07

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

廖璽元

研究生(外文):

Hsi-Yuan Liao

論文名稱:

雜訊強健技術運用於不同種類之倒頻譜特徵於語音辨識之效能探究

論文名稱(外文):

Evaluation of noise-robustness techniques on various types of cepstral features for speech recognition

指導教授:

洪志偉

指導教授(外文):

Jeih-weih Hung

口試委員:

林容杉、陳柏琳

口試委員(外文):

Jung-Shan Lin、Ber-lin Chen

口試日期:

2013-06-14

學位類別:

碩士

校院名稱:

國立暨南國際大學

系所名稱:

電機工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2013

畢業學年度:

101

語文別:

中文

論文頁數:

中文關鍵詞:

倒頻譜係數、特徵強健性、雜訊環境

外文關鍵詞:

noise robustness、speech recognition、gammatone filter、mel filter、speech features

相關次數:

被引用:0
點閱:409
評分:
下載:26
書目收藏:0

在本論文中，我們主要分析介紹應用於語音辨識中之三種語音特徵表示法：梅爾倒頻譜特徵係數(MFCC)、扭曲型傅立葉轉換之倒頻譜係數(WDFTCC)及珈瑪調倒頻譜係數(GTCC)，並凸顯它們之間的差異，另外，我們將三種語音特徵配合數種特徵強健性技術，觀察它們於乾淨環境與雜訊環境下所得到的辨識率，藉此評估比較三種語音特徵其鑑別力、抗噪性及與它們跟強健演算法搭配的加成性，從Aurora-2數字資料庫的實驗評估結果來看，在不使用強健演算法時，傳統的MFCC表現略優於WDFTCC與GTCC，然而當配合強健演算法使用時，WDFTCC與GTCC這兩種新型特徵相較於MFCC能達到更佳的辨識效果。

In this thesis, we first introduce three types of speech feature representation, including mel-frequency cepstral coefficients (MFCC), warped discrete Fourier transform cepstral coefficients (WDFTCC) and gammatone frequency cepstral coefficients (GTCC). These feature representations primarily differ in the applied filter-bank. Then we perform several noise robustness techniques on the above three types of features to evaluate the corresponding additivity in improving the recognition accuracy under a wide range of noise-corrupted environments provided in the Aurora-2 connected digit database. Experimental results reveal that, in the case of no noise robustness processing, MFCC performs slightly better than WDFTCC and GTCC, while GTCC and WDFTCC outperform MFCC when all of them are enhanced by any of the noise robustness approaches in advance.

摘要.....................................................I
Abstract.................................................II
目錄.....................................................III
圖目錄...................................................V
表目錄...................................................VII

目錄
第一章緒論
1.1 研究動機..............................................1
1.2 研究方法簡介..........................................2
1.3 章節內容大綱..........................................5

第二章　一般語音特徵擷取方式之介紹
2.1 語音特徵的共通特性....................................6
2.2三類語音特徵求取流程及特性描述.........................7

第三章各類語音特徵擷取之介紹
3.1在MFCC中使用的梅爾尺度濾波器組(mel-scale filterbank)...16
3.2 在WDFTCC中所使用的全通濾波器轉換之濾波器組(all-pass transformedfilterbank)....................................19
3.3 在GTCC中所使用的珈瑪調濾波器組(gammatone filterbank)..22

第四章改進特徵之雜訊強健性技術的介紹
4.1倒頻譜平均值與變異數正規化法(Cepstral mean and variance normalization, CMVN）.....................................26
4.2倒頻譜增益正規化法（Cepstral gain normalization, CGN)..29
4.3倒頻譜統計圖正規化法(Cepstral histogram normalization, CHN)......................................................32
4.4微幅功率提升法(small power boosting, SPB)..............35
4.5微幅功率削減法(small power reduction, SPR).............39

第五章語音處理濾波器實驗環境設定與實驗結果
5.1 語音資料庫設定........................................43
5.2 語音特徵與語音聲學模型之介紹.............................45
5.3 辨識效能之評估.........................................46
5.4三種語音特徵之基礎實驗結果比較.............................47
5.5經強健性演算法處理後之三種語音特徵其實驗結果比較...............50
5.6綜合比較與討論..........................................63

第六章結論與未來展望
參考文獻..................................................65

圖目錄
圖1.1 乾淨語音經過加成性及摺積性雜訊干擾示意圖.............2
圖2.1 各種特徵擷取之流程圖................................14
圖3.1 語音特徵擷取的流程圖................................15
圖3.2 線性頻率轉換成梅爾頻率圖............................16
圖3.3 梅爾濾波器(mel-filter)頻率響應圖....................18
圖3.4 all-pass filter, β=0...............................19
圖3.5 all-pass filter, β=0.56............................20
圖3.6 all-pass filter, β=-0.56...........................20
圖3.7 WDFTCC使用之filter-bank其頻率響應圖.................21
圖3.8 珈瑪調濾波器(gammatone filter)的脈衝響應圖..........22
圖3.9 珈瑪調濾波器頻率(強度)響應圖........................23
圖3.10 珈瑪調濾波器(gammatone filter)頻率響應圖...........24
圖4.1 乾淨語句及對應之雜訊語句的MFCC c1特徵時間序列.......27
圖4.2 乾淨語句及對應之雜訊語句的MFCC c1特徵經CMVN處理後的時間序列....................................................27
圖4.3乾淨語句及對應之雜訊語句的MFCC c1特徵時間序列........30
圖4.4乾淨語句及對應之雜訊語句的MFCC c1特徵經CGN處理後的時間序列......................................................30
圖4.5乾淨語句及對應之雜訊語句的MFCC c1特徵時間序列........33
圖4.6乾淨語句及對應之雜訊語句的MFCC c1特徵經CHN處理後的時間序列......................................................33
圖4.7乾淨語句及對應之雜訊語句的梅爾頻譜對數功率...........37
圖4.8乾淨語句及對應之雜訊語句的梅爾頻譜經SPB處理後的頻譜功率........................................................37
圖4.9乾淨語句及對應之雜訊語句的梅爾頻譜對數功率...........41
圖4.10乾淨語句及對應之雜訊語句的梅爾頻譜經SPR處理後的頻譜功率........................................................41
圖5.1 由左向右之連續密度隱藏式馬可夫模型(HMM)的示意圖.....45
圖5.2 不同雜訊種類的平均辨識率(%).........................48
圖5.3不同雜訊種類的平均辨識率(%)..........................52
圖5.4不同雜訊種類的平均辨識率(%)..........................54
圖5.5不同雜訊種類的平均辨識率(%)..........................57
圖5.6不同雜訊種類的平均辨識率(%)..........................60
圖5.7不同雜訊種類的平均辨識率(%)..........................62
圖5.8不同正規化法種類的平均辨識率(%)......................64

表目錄
表4.1 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......28
表4.2 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......31
表4.3 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......34
表4.4 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......38
表4.5 不同特徵值同一種乾淨語句及雜訊語句的誤差值平均......42
表5.1 Aurora-2.0語料庫相關資訊............................44
表5.2 MFCC基礎實驗之各測試組在不同訊雜比同一組別的平均辨識率(%).......................................................47
表5.3 WDFTCC基礎實驗之各測試組在不同訊雜比同一組別的平均辨識率(%).....................................................47
表5.4 GTCC基礎實驗之各測試組在不同訊雜比同一組別的平均辨識率(%).......................................................48
表5.5 CMVN作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...50
表5.6 CMVN作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).51
表5.7 CMVN作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...51
表5.8 CGN作用於MFCC之不同訊雜比同一組別的平均辨識率(%)....53
表5.9 CGN作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%)..53
表5.10 CGN作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...54
表5.11 CHN作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...56
表5.12 CHN作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).56
表5.13 CHN作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...57
表5.14 SPB作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...58
表5.15 SPB作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).59
表5.16 SPB作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...59
表5.17 SPR作用於MFCC之不同訊雜比同一組別的平均辨識率(%)...61
表5.18 SPR作用於WDFTCC之不同訊雜比同一組別的平均辨識率(%).61
表5.19 SPR作用於GTCC之不同訊雜比同一組別的平均辨識率(%)...62

[1]王小川, “語音訊號處理,” 全華科技圖書, 2004.
[2]S.Tarar, “Speech Analysis: Desktop Items Activation Using Dynamic Time Warping,” IEEE International Conference on Computer Science and Information Technology, 6, pp. 657 - 659, 2010.
[3]T. Li, W. Xu, J. Pan, Y. Yan, “Improving Automatic Speech Recognizer of Voice Search using System Combination,” Fuzzy Systems and Knowledge Discovery, 4, pp. 477 - 480, 2009.
[4]S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics Speech and Signal Processing, 27(2), pp. 113-120, 1979.
[5]C. Plapous, C. Marro and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement” IEEE Transactions on Acoustics Speech and Signal Processing, 14(6), pp. 2098-2108, 2006.
[6]P. J. Moreno, B. Raj, and R. M. Stern, “A vector Taylor series approach for environment-independent speech recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 2, pp. 733-736, 1996.
[7]A. Davis, S. Y. Low, S. Nordholm, “A Subband Space Constrained Beamformer Incorporating VoiceActivity Detection,” in Proceedings IEEE International Conference on Acoustics Speech and Signal Processing, 3, pp. iii/65 - iii/68, 2005.
[8]C. W. Hsu, L. S. Lee, “Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition,” IEEE Transactions on Audio Speech and Language Processing, 17(2), pp. 205 - 220, 2009.
[9]H. Hermansky, N. Morgan, “RASTA processing of speech,” IEEE Transactions on Speech and Audio Processing, 2(4), pp. 578 - 589, 1994.
[10]X. Huang, A. Acero and H. W. Hon, “Spoken language processing: a guide to theory, algorithm and system development,” Prentice Hall PTR, 2001.
[11]R. Muralishankar,D. O'Shaughnessy, “A Comparative Analysis of Noise Robust Speech Features Extracted from All-pass based Warping with MFCC in a Noisy Phoneme Recognition,” The Third International Conference on Digital Telecommunications, pp. 180 - 185, 2008.
[12]R. Schlider, I. Bezrukov, H. Wagner, H. Ney, “Gammatone features and feature combination for large vocabulary Speech recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 4, pp. IV-649 - IV-652, 2007.
[13]M. Zbancioc, M. Costin, “Using neural networks and LPCC to improve speech recognition,” in Proceedings of International Symposium on Signals Circuits and Systems, pp. 445 - 448, 2003.
[14]Y. Peng,L. Mu,X. Kong, Z. Lin, L. Wang, “A Study On Echo Feature Extraction Based On The Modified Relative Spectra(RASTA) and Perception Linear Prediction(PLP) Auditory Model,” in Proceedings of IEEE International Conference on Intelligent Computing and Intelligent Systems, 2, pp. 657 - 661, 2010.
[15]R. Muralishankar, A. Sangwan and D. O'Shaughnessy, “Warped discrete cosine transform cepstrum: a new feature for Speech processing,” in Proceedings of IEEE Workshop on Digital Object , pp.99 - 104, 2005.
[16]S. Tiberewala and H. Hermansky, “Multiband and adaptation approaches to robust speech recognition,” in Proceedings of European Conference on Speech Communication and Technology, 25(1-3), pp. 2619-2622, 1997.
[17]S. Yoshizawa, N. Hayasaka, N. Wada and Y. Miyanaga, “Cepstral gain normalization for noise robust speech recognition,” in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 1, pp. I-209-212, 2004.
[18]F. Hilger and H. Ney, “Quantile based histogram equalization for noise robust large vocabulary speech recognition,” IEEE Transactions on Audio Speech and Language Processing, 14(3), pp. 845–854, 2006.
[19]C. Kim, K. Kumar, and R. M. Stern, “Robust Speech Recognition using a Small Power Boosting Algorithm,” in Proceedings of IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 243 - 248, 2009.
[20]J. S. Lin, I. C. Liu, and J. W. Hung, “Small Power Reduction Technique in Noise-Robust Speech Recognition, ” submitted to IEICE transactions on Information and Systems, 2013
[21]H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” in Proceedings of the Automatic Speech Recognition: Challenges for the new Millenium, pp. 181-188, 2000.
[22]S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev and P. Woodland, “The HTK Book (for HTK Version 3.4),” Cambridge University Engineering Department, Cambridge, UK, 2006.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	稀疏表示法應用於噪音強健性語音辨識
2.	線性估測處理之特徵時序列於強健性語音辨識之研究
3.	分頻式調變頻譜分解於強健性語音辨識之研究
4.	基於經驗模態分解之噪音強健性自動語音辨識

無相關期刊

1.	調變頻譜冪次展開法於強健語音辨識之研究
2.	線性估測處理之特徵時序列於強健性語音辨識之研究
3.	多種語音結構於情緒辨識之初步研究
4.	門檻值去噪法於調變頻譜之強健性語音辨識研究
5.	調變頻譜指數權重法於強健性語音辨識之研究
6.	語音特徵倒頻帶塑型與正規化於強健性語音辨識之研究
7.	廣義對數調變頻譜正規化法於強健語音辨識之研究
8.	基於合作式通訊系統中非同調/差分編碼偵測之研究
9.	分頻式調變頻譜分解於強健性語音辨識之研究
10.	新移民華語學習需求與教師教學策略之分析──以南台灣地區新移民班級為例
11.	臺灣與香港教師組織運作方式之比較
12.	強健主軸分析法應用於雜訊干擾之語音辨識的初步研究
13.	語音調變頻譜強化之研究
14.	精神疾病患者之社群感、主觀歧視感受與幸福感的相關研究
15.	行動團購應用程式設計與開發: 服務體驗工程法之應用

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室