跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.138) 您好!臺灣時間:2024/09/16 00:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡韋億
研究生(外文):Wei-yi Tsai
論文名稱:強健性語音辨認中利用調變頻譜所設計之時間序列濾波器的研究
論文名稱(外文):Design of Temporal Filters Based on Modulation Spectrum for Robust Speech Recognition
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:109
中文關鍵詞:強健性語音辨識時間序列濾波器調變頻譜
外文關鍵詞:Robust speech recognitiontemporal filtermodulation spectra
相關次數:
  • 被引用被引用:0
  • 點閱點閱:244
  • 評分評分:
  • 下載下載:35
  • 收藏至我的研究室書目清單書目收藏:1
隨著科技時代的來臨,電腦處理運算速度的提升以及電子設備的普及化,目前日常生活中人類使用影音多媒體系統的機會也隨之增加,例如數位電視,數位錄放影機,電腦點歌系統…等等,基於這些原因,目前科技產品導向於體積小、重量輕、多功能的設計。自古以來語音一直是人類最自然也是最直接的溝通方式,若能利用語音作為人類與電腦之間的溝通橋樑,必能改善目前採用鍵盤按鍵或滑鼠等之不便利性,加上並非許多環境是允許使用鍵盤或滑鼠的輸入方式,故語音是未來輸入方式十分吸引的選擇之一,例如目前的汽車導航系統,若能採用語音系統輸入,並且以語音輸出直接與駕駛者做溝通,將會大大提升系統之便利性。然而目前語音辨識系統往往無法發揮最大效益,原因之一在於往往會因為背景環境的不同以及通道效應而導致降低其系統辨識效果。
本篇論文主旨,在於語音辨認的系統中改善特徵參數所使用之時間序列濾波器,以進一步提升語音辨識的強健性,在過去資料導向的時間序列濾波器藉由一些最佳化準則的使用已經被證實有能力提升語音辨識系統在雜訊環境下的辨識率。然而這些濾波器的設計通常是根據特徵參數在時域上的統計特性,而在本文提出三種新的資料導向的時間序列濾波器,主要是使用特徵參數在調變頻域上的統計特性,包括受限之主軸成分分析法(Constrained -Principal Component Analysis,C-PCA)、受限之線性鑑別分析法(Constrained- Linear Discriminant Analysis,C-LDA )以及受限之最大分類距離法(Constrained- Maximum Class Distance,C-MCD);直接透過調變頻域上的統計特性來求得最佳化的時間濾波器序列之係數,同時我們也進一步將這些新技術與傳統倒頻譜正規化方法做更進一步的結合,包含倒頻譜平均與變異數正規化法(Cepstral Mean and Variance Normalization,CMVN)以及倒頻增益正規化法(Cepstral Gain Normalization,CGN)。本論文之辨識實驗皆使用國際通用的Aurora 2.0數字語音資料庫;由初步實驗結果顯示我們所提出來之三種新的時間序列濾波器技術,可以顯著提升語音辨識其正確率,不管在各種雜訊環境下皆有良好的表現,當使用於梅爾倒頻譜特徵係數(Mel-Frequency Cepstral Coefficients,MFCC)時,並且顯示與倒頻譜平均與變異數正規化法和倒頻增益正規化法有加成的效果,得以進一步提升辨識的正確率。
The computer and its related products have become a necessity in the modern life,and some of their common features are that they are often small in size,light in weight,and even invisible。As a result,the traditional man-machine interfaces,such as keyboard and mouse,are not convenient any longer。On the other hand,voice can be a very natural and efficient tool for people to communicate with these new equipments,with well-developed speech recognition techniques,it is no longer a dream for us to “talk” with machines。
However,the performance of a speech recognizer is often limited by its application environment。For example,the background noise and the channel effect often degrades the recognition accuracy very seriously。In the past,tremendous approaches by researchers have been proposed to enhance the recognizer’s performance under an adverse environment。In this thesis,we focus on developing new temporal filtering techniques for speech features in order to improve their robustness in noisy speech recognition。
The new proposed temporal filters are based on the statistical information of the modulation spectrum for speech features。They are derived according to constrainted versions of Principal Component Analysis(PCA)、Linear Discriminant Analysis(LDA)and Maximum Class Distance(MCD),respectively。
The result of a series of experiments conducted on Aurora 2.0 database show that the proposed temporal filters effectively enhance the recognition performance under noisy environments and they can be integrated with other temporal filtering approaches,Cepstral Mean and Variance Normalization(CMVN) and Cepstral Gain Normalization(CGN),to provide further improvements。
摘要 I
Abstract III
目錄 V
圖目錄 VIII
表目錄 X
第一章 緒論 1
1-1研究動機 1
1-2 研究方法簡介 2
1-2.1 摺積性雜訊(convolutional noise) 2
1-2.2 加成性雜訊(additive noise) 2
1-3 論文架構 5
第二章 實驗背景與基礎系統的建立 6
2-1 語音資料庫之說明與分析 6
2-2 語音特徵參數之擷取 12
2-3 語音聲學模型的介紹以及辨識效能之評估 19
2-4 基礎系統實驗之結果 20
第三章 各種時間序列濾波器之介紹 23
3-1 語料無關時間序列濾波器 23
3-1.1 相對頻譜時間序列濾波器(RelAtive SpecTrA,RASTA) 24
3-1.2 倒頻譜平均消去法(Cepstral Mean Subtraction,CMS) 26
3-1.3 倒頻譜平均與變異數正規化法(Cepstral Mean and Variance Normalization ,CMVN) 26
3-1.4 倒頻譜增益正規化法(Cepstral Gain Normalization,CGN) 27
3-2 語料相關之時間序列濾波器 28
3-2.1 主軸成分分析(Principal Component Analysis,PCA)時間序列濾波器 29
3-2.2 線性鑑別分析(Linear Discriminant Analysis,LDA)時間序列濾波器 30
3-2.3 最大類別距離法(Maximum Class Distance,MCD)時間序列濾波器 33
第四章 各種時間序列濾波器之實驗結果 37
4-1 語料無關之時間序列濾波器其實驗結果 38
4-1.1 相對頻譜時間序列濾波器(RelAtive SpecTrA,RASTA)實驗結果 38
4-1.2倒頻譜平均消去法(Cepstral Mean Subtraction,CMS)之實驗結果 40
4-1.3倒頻譜平均與變異數正規化法(Cepstral Mean and Variance Normalization,CMVN)其實驗結果 42
4-1.4 倒頻譜增益正規化法(Cepstral Gain Normalization,CGN)之實驗結果 44
4-2 語料相關之時間序列濾波器之實驗結果 47
4-2.1 PCA結合CMVN之實驗結果 47
4-2.2 LDA結合CMVN之實驗結果 50
4-2.3 MCD結合CMVN之實驗結果 53
4-3 結論與分析 56
第五章 由調變頻譜中所求取之時間序列濾波器 57
5-1 受限之線性鑑別分析法(Constrained Linear Discriminant Analysis,C-LDA) 59
5-2 受限之主軸成分分析法(Constrained Principal Component Analysis,C-PCA) 63
5-3 受限之最大類別距離法(Constrained Maximum Class Distance,C-MCD) 65
5-4 在調變頻譜中時間序列濾波器之實驗結果 68
5-4.1受限之線性鑑別分析法之實驗結果 68
5-4.2受限之主軸成分分析法其實驗結果 72
5-4.3 受限之最大類別距離法其實驗結果 76
5-5 結論 80
第六章 調變頻譜中時間序列濾波器與各種正規化法之結合 81
6-1 CMVN與調變頻譜中時間序列濾波器結合之實驗結果 81
6-1.1 CMVN與C-LDA結合之實驗結果 81
6-1.2 CMVN與C-PCA結合之實驗結果 84
6-1.3 CMVN與C-MCD結合之實驗結果 87
6-2 CGN與調變頻譜中時間序列濾波器結合其實驗結果 90
6-2.1 CGN與C-LDA結合之實驗結果 90
6-2.2 CGN與C-PCA結合之實驗結果 93
6-2.3 CGN與C-MCD結合之實驗結果 96
6-3 結論與分析 99
第七章 結論與未來展望 105
參考文獻 106
參考文獻
[1]V. Zue, ”Speech in Oxygen”, Technical Report, Computer Science Lab, MIT, Cambridge, MA, USA, May, 2001
[2]C. J. Leggester and P. C. Woodland. ”Maximum likelihood linear regression for speaker adaptation of continous density HMMs”. Computer Speech and Language, 171-186, 1995
[3]J. H. Holmes, N. C. Sedgwick, ”Noise compensation for speech recognition using probabilistic models”. ICASSP,1986
[4]D. H. Klatt, ”A digital filterbank for spectral matching”. Processding of ICASSP, 1986
[5]A. P. Varga and R. K. Moore. ”Hidden Markov model decomposition of speech and noise”. Proceedings of ICASSP, 845-848, 1990
[6]A. D. Berstein and I. D. Shallom. ”An hypothesized Wiener filtering approach to noisy speech recognition”. Proceedings of ICASSP, 913-916, 1991
[7]V. L. Beattie and S. J. Yung. ”Hidden Markov model state-based cepstral noise compensation”. Proceeding of ICSLP, 519-522, 1992
[8]A. Acero, L, Deng, T. Jristjansson, and J. Zhang. ”HMM adaptation using vector Taylorseries for noisy speech recognition”. Proceedings of ICSLP, 869-872, 2000
[9]J. L. Gauiain and C. H. Lee, ”Maximun a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains”, IEEE Trans. on Speech and Audio Processing, 1994
[10]C. J. Leggester and P. C. Woodland. ”Maximum likelihood linear regression for speaker adaptation of continous density HMMs”. Computer Speech and Language, 171-186, 1995
[11]A. Sankar , and C. -H. Lee. ”A maximum-likelihood approach to stochastic matching for robust speech recognition”. IEEE Trans. Acoust. Speech signal process. 190-202, 1996
[12]C. –H. Lee. ”On stochastic feature and model compensation approaches to robust speech recognition”. Speech Communication 25:29-47, 1998
[13]P. J. Moreno, B. Raj and R. M. Stern. ”Data-driven environmental compensation for speech recognition: A unified approach”. Speech Communication 24:267-285, 1998
[14]M. J. F. Gales and S. J. Young. ”Cepstral parameter compensation for HMM recognition in noise”. Speech Communication 12:231-239, 1993
[15]M. J. F. Gales and S. J. Young. ”Robust speech recognition in additive and convolutional noise using parallel model combination”. Computer Speech and Language 9:289-307, 1995
[16]M. J. F. Gales and S. J. Young. ”A fast and flexible implementation of parallel model combination”. Proceeding of ICASSP, 131-136, 1995
[17]Y. C. Tam, B. Mak. ”Optimization of sub-band weights using simulated noisy speech in multi-band speech recognition”. Proceedings of ICSLP, 313-316, 2000
[18]A. acero. ”Acoustical and environmental rebustned in automatic speech recognition”. Kluwer Academic Press.Boston, MA. 1991
[19]ITU-T Recommendation G.729-Annex B:A silence compression seem for G.729 optimized for terminals conforming to Recommendation V.70
[20]Y. Ephraim and H. L. VanTrees. ”Asignal Subspace Approach for Speech Enhancement”, IEEE Transactions. on Speech and Audio Processing, 1995
[21]S. F. Boll. ”Suppression of acoustic noise in speech using spectral subtraction”. IEEE Trans. Acoust. Speech, Signal Process. vol27, pp. 113-120, apr. 1979
[22]P. Lockwood and J. Boudy. ”Experiments with a Nonlinear Spectral Subtractor(NSS), Hidden Makov Models and Projection, for Robust Speech Recognition in Cars”, Proceeding of Eurospeech, 1991
[23]Atal. B. S. ”Effectiveness of linear prediction characteristics of speech wave for a utomatic speaker identification and verification”. J. Acoust. Soc, AM. 55(6), 1304- 1312, 1974.
[24]S. Tiberewala and H. Hermansky. ”Multiband and adaptation approaches to robust speech recognition ”. in Proc. Eurospeech97, 1997, pp. 107-110
[25]O. Viikki and K. Laurila. ”Noise robust HMM-based speech recognition using segmental cepstral feature vector normalization”. in ESCA NATO Workshop Robust Speech Recognition Unknown Communication Channels, Pont-a-Mousson, France, 1997, pp. 107-110
[26]S. Yoshizawa, N. Hayasaka, W. Naoya, and Y. Miyanaga. ”Cepstral Gain Normalization for Noise Robust Speech Recognition”. Proceedings of ICASSP 2004
[27]H. Hermansky and N. Morgan. ”RASTA processing of speech”. IEEE Transactions on Speech and Audio Processing, 2, 578-589, 1994
[28]F. Hilger and H. Ney. ”Quantile Based Histogram Equalization for Noise Robust Speech Recognition”. Proceedings of Eurospeech, 2001
[29]H. Y. Jung and S. Y. Lee. ”On the Temporal Decorrelation of feature Parameters for noise Robust Speech Recognition, IEEE 2000
[30]C. Nadeu, D. Macho, and J. Hernando. ”Time and frequency filtering of filter-bank energies for robust HMM speech recognition”. Speech Communication 2001
[31]N. Kanedera, T. Aria, H. Hermansky and M. Pavel. ”On The Importance of Various Modulation Frequencies for Speech Recognition”. Proceedings of Eurospeech, 1997
[32]F. Bernard and H. U. Reinhold. ”A Compartive Study of Linear Feature Reansformation techniques for Automatic Speech Recognition”. Proceedings of ICSLP 1996
[33]B. Elio and N. Climent. ”Feature Decorrelation Methods in Speech Recognition”. A Comparative Study
[34]J. W. Hung and L. S. Lee. ”Comparative Analysis for Data-Driven Temporal Filters Obtained Via Principal Component Analysis(PCA) and Linear Discriminant Analysis(LDA) In Speech Recognition”. Proceedings of Eurospeech 2001
[35]S. V. Vuuren and H. Hermansky. ”Data-Driven Design of RASTA-Like Filters”. Proceedings of ICSLP 1996
[36]L. Markus and H. U. Reinhold. ”LDA Derived Cepstral Trajectory Filters in Adverse Environmental Conditions”. Proceedings of ICASSP 2000
[37]J. W. Hung and L. S. Lee. ”Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition”. IEEE Transactions. on Speech and Audio Processing 2004
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top