跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.131) 您好!臺灣時間:2026/01/16 02:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃永勝
研究生(外文):Yung-Sheng Huang
論文名稱:強健性語音辨認之研究:語音特徵係數之時間序列濾波器的改進技術
論文名稱(外文):Robust Speech Recognition : Improved temporal filtering on speech feature coefficients
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
中文關鍵詞:強健性語音辨認時間序列濾波器最大交互訊息多特性向量
外文關鍵詞:Robust Speech RecognitionTemporal FilterMaximum Mutual InformationMulti-Eigenvector
相關次數:
  • 被引用被引用:1
  • 點閱點閱:162
  • 評分評分:
  • 下載下載:21
  • 收藏至我的研究室書目清單書目收藏:1
近年來,隨著高科技產業的迅速發展,使得人類的生活與電腦之間的關係更加密切,為了更方便地與電腦溝通,語音辨認之人機界面操作方式已成為目前最重要的研究課題之一,且也已經被應用於日常生活中。然而,要使語音辨認技術能廣為大家使用的一個最基本也最重要的問題就在於辨認的精確度,而對於語音辨認系統而言,當其訓練環境與應用環境彼此不匹配時,通常會造成其辨認率嚴重下降。對於語音辨認的強健性技術,大部分皆是為處理此環境不匹配的問題,本論文研究方向便是針對環境不匹配的問題,發展一系列對於語音特徵參數的強健性處理方法,減弱外在環境下的雜訊干擾,使得環境不匹配的程度隨之降低。研究方向的內容分為四大部分,包含了(1)最大交互訊息時間序列濾波器技術(2)語料相關時間序列濾波器長度之探討(3)多特性向量時間序列濾波器技術(4)時間序列濾波器與頻譜消去法之結合。
在第一部分中,最大交互訊息時間序列濾波器技術為一個新的與語料相關時間序列濾波器,此時間序列濾波器技術是以消息理論中的交互訊息理論為基礎所設計出的方法。經由實驗發現,最大交互訊息時間序列濾波器具有如同相對頻譜時間序列濾波器的效用,對於加成性雜訊的處理以及辨識率的提升有著不錯的效果,尤其對於不穩定雜訊的的處理效果比穩定雜訊好。
在第二部分中,針對濾波器的長度做深入分析,探討濾波器長度對於辨認率的影響。由實驗結果中也發現設定不同的濾波器長度對於辨認率是有所影響的,而線性鑑別分析時間序列濾器、主成份分析時間序列濾波器、最小分類錯誤法時間序列濾波器、及以特徵參數為基礎之最大交互訊息時間序列濾波器,其最佳的辨認正確率是在當濾波器長度分別為11、11、101、和101時。
在第三部分中,介紹了多特性向量時間序列濾波器技術,此濾波器的設計方法是將多個特徵向量以其對應的特徵值或特徵值之平方根作為權重後加總所得,目的在結合包含較多的資訊量,使得語音特徵參數更具代表性,也更具強健性。經由實驗發現線性鑑別分析時間序列濾波器以特徵值之平方根為權重的辨認正確率勝過特徵值為權重的辨認正確率,而在主成份分析時間序列濾波器的實驗結果上,則明顯的以特徵值為權重的辨認正確率較高。
最後第四部分中,利用非線性頻譜消去法,在頻譜域上將語音訊號做強化,使得雜訊影響的程度降低,然後在倒頻譜域上對語音特徵參數使用語料相關時間序列濾波器。經由實驗驗證,非線性頻譜消去法與語料相關時間序列濾波器結合後,兩者的優點依然存在,對雜訊的處理能力能夠有效的加成,這也使得辨識率又更進一步,且也驗證在兩種不同領域上的強健性方法結合,是具有加成效果的。
In recent year, with the fast expanding of high-tech industry, computer is much important for human being. For the convenient communication to computer, Man Machine Interface of speech recognition has become one of the most important researches in the world and also has been applied in daily life. However, the most important problem of speech recognition techniques is the accuracy of recognition. But, when there is a mismatch between the acoustic conditions of training and application environments for a speech recognition system, the performance of the system very often is seriously degraded for speech recognition system. Hence, the robustness of speech recognition techniques with respect to any of these different mismatched acoustic conditions becomes very important. In this thesis, we develop some robust techniques of speech feature coefficients for the mismatch to alleviate the effect of noise of outside environment and the level of mismatch conditions. This thesis is organized as follows: (1)Maximum Mutual Information temporal filters approach (2)Discussing the length of the data-driven temporal filters (3)Multi-Eigenvector temporal filters approach(4)Combining temporal filters with Nonlinear Spectral Subtraction
In the first part, we proposed the use of new optimization criteria, Maximum Mutual Information (MMI), is applied in the optimization process to obtain temporal filters. Experimental results found that the MMI-derived temporal filters significantly improve the recognition performance as RASTA temporal filter do. Especially, MMI-derived temporal filters work with nonstationary noise better than stationary noise.
In the second part, we will analyze the length for the FIR filters and discuss the effect of the length to the accuracy of recognition. Experimental results also found that different length has the effect to the recognition. We found that the optimal lengths for LDA- , PCA-, MCE, and Feature -based MCE- derived filters may be taken as 11, 11, 101, and 101, respectively.
In the third part, we introduce the Multi-Eigenvector temporal filters approach which it design form the first M eigenvectors obtained in LAD or PCA are weighted by their corresponding eigenvalues or square root of eigenvalues and summed to be used as the filter coefficients. In order to include much data and make the robust feature coefficients. In the experiments, we found that the high accuracy of recognition for LDA, and PCA filters may be weighted as square root of eigenvalues, and eigenvalues, respectively.
In the last part, we will make use of spectra subtraction to enhance the speech signal in the spectral domain, so as to reduce the influence by noise. Next, we make use of temporal filters in the cepstral domain. Experimental results showed that the temporal filters are combined with the spectra subtraction, the recognition performance will be further improved. So combining with the two kinds of approach in the different domain is additive.
誌謝..........................................................i
摘要........................................................iii
Abstract....................................................vii
目錄.........................................................xi
圖目錄.......................................................xv
表目錄......................................................xix
第一章 緒論...................................................1
1.1 研究動機..................................................1
1.2 研究方法簡介..............................................3
1.3 論文架構..................................................6
第二章 實驗背景與基礎系統的建立...............................9
2.1 語音資料庫之介紹..........................................9
2.2 語音特徵參數的抽取.......................................10
2.3 語音聲學模型的建立.......................................16
2.4 辨識效能的評估...........................................17
2.5 基礎系統之實驗結果.......................................18
2.6 本章結論.................................................24
第三章 強健性語音辨認處理技術................................27
3.1 時間序列濾波器...........................................27
3.1.1 語料無關的時間序列濾波器...............................30
3.1.2 語料相關的時間序列濾波器...............................34
3.2 頻譜消去法...............................................43
3.2.1 非線性頻譜消去法.......................................44
3.3 本章結論.................................................45
第四章 最大交互訊息時間序列濾波器技術........................47
4.1 最大交互訊息時間序列濾波器的設計.........................47
4.2 以特徵參數為基礎之最大交互訊息時間序列濾波器.............49
4.3 以模型為基礎之最大交互訊息時間序列濾波器.................50
4.4本章結論..................................................52
第五章 時間序列濾波器技術之實驗結果..........................53
5.1 語料無關時間序列濾波器的實驗結果.........................53
5.2 語料相關時間序列濾波器的實驗結果.........................60
5.3 以特徵參數為基礎之最大交互訊息時間序列濾波器與其它時間序列 濾波器實驗結果的討論.........................................72
5.4 本章結論.................................................76
第六章 語料相關的時間序列濾波器長度之探討....................79
6.1 語料相關時間序列濾波器長度探討的實驗結果.................79
6.2 語料相關時間序列濾波器結合倒頻譜正規化法的實驗結果.......90
6.3 本章結論.................................................97
第七章 多特徵向量時間序列濾波器技術..........................99
7.1 多特徵向量時間序列濾波器.................................99
7.2 多特徵向量時間序列濾波器參數M的選取.....................101
7.3 多特性向量時間序列濾波器的實驗結果......................105
7.4 多特性向量時間序列濾波器結合倒頻譜正規化法的實驗結果....111
7.5 本章結論................................................114
第八章 時間序列濾波器與頻譜消去法之結合.....................117
8.1 非線性頻譜消去法的實驗結果..............................117
8.2 語料相關時間序列濾波器結合非線性頻譜消去法的實驗結果....119
8.3 頻譜消去法結合多特徵向量時間序列濾波器的實驗結果........126
8.4 本章結論................................................129
第九章 結論與展望...........................................131
9.1結論.....................................................131
9.2 展望....................................................133
參考文獻....................................................135
[1] V. Zue, “Speech in Oxygen” Technical Report, Computer Science Lab., MIT, Cambridge, MA, USA, May 2001.
[2] Y. Gong, “Speech Recognition in Noisy Environments: A Survey”, Speech Communication 16, 1995.
[3] M.J.F. Gales, “Model-based Techniques for Noise Robust Speech Recognition”, University of Cambridge, Sep. 1995.
[4] Boll, S. F, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans. on ASSP, Vol. 27, No. 2, pp.113-120.1979
[5] P. Lockwood and J. Boudy, “Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars”, Eurospeech 1991.
[6] ITU-T Recommendation G.729 — Annex B: A silence compression sceme for G. 729 optimized for terminals conforming to Recommendation V.70
[7] B.A. Mellor and A.P. Varga, “Noise Masking in the MFCC Domain for the Recognition of Speech in Background Noise”, ICASSP 1992.
[8] Y. Ephraim and H.L. Van Trees, “A Signal Subspace Approach for Speech Enhancement”, IEEE Trans. on Speech and Audio Processing, 1995.
[9] S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification”. IEEE Trans. Acoust. Speech Signal Process. 1981
[10] O. Viikki and K. Laurila, “Noise Robust HMM-based Speech Recognition Using Segmental Cepstral Feature Vector Normalization,” in ESCA NATO Workshop Robust Speech Recognition Unknown Communication Channels, Pont-a-Mousson, France, 1997, pp. 107—110.
[11] H. Hermansky and N. Morgan, “RASTA Processing of Speech”. IEEE Trans. on Speech and Audio Processing. 2, pp. 578-589, 1994
[12] Kuo-Hwei Yuo and Hsiao-Chuan Wang, “Robust Features for Noisy Speech Recognition Based on Temporal Trajectory Filtering of Short-Time Autocorrelation Sequences”, Speech Communication 28, 1999.
[13] J.W. Hung, J.L. Shen, L.S. Lee, “New Approaches for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques”, IEEE Trans. on Speech and Audio Processing, Nov. 2001.
[14] J.L. Gauiain and C.H.Lee, “Maximum a Posteriori Estimation for
Multivariate Gaussian Mixture Observations of Markov Chains”, IEEE Trans. on Speech and Audio Processing, 1994.
[15] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”, Computer Speech and Language, 1995.
[16] John, R.Deller, John G.Proaskis, John H.L.Hansen,“Discrete-Time Processing of Speech Signals”.
[17] Y. K. Muthusamy and R. A. Cole, “Automatic Segmentation and Identification of Ten Languages Using Telephone Speech,” in Proc. ICSLP ’92, vol. 2, Oct. 1992, pp.1007-1010
[18] C. Nadeu, D. Macho, and J. Hernando, “Time and frequency filtering of filter-bank energies for robust HMM speech recognition”, Speech Communication, 2001.
[19] N. Kanedera, T. Arai, H. Hermansky, and M. Pavel “On The Importance of Various Modulation Frequencies for Speech Recognition,” Proc. Eurospeech ’97, Rhodes, Greece,pp. 1079 — 1082.
[20] N. Kanedera, T. Arai, H. Hermansky, “Desired Characteristics of Modulation Spectrum for Robust Automatic Speech Recognition” ICASSP 1998
[21] R. Hariharan, I. Kiss, and O. Viikki, ”Noise robust speech parameterization using multiresolution feature extraction,” IEEE Trans. on Speech and Audio Processing, Nov.2001, pp856-865
[22] Sarel van Vuuren and H. Hermansky, “Data-Driven Design of RASTA-Like Filters”, ICSLP 1996.
[23] J-W. Hung, L.S. Lee “Comparative Analysis for Data-Driven Temporal Filters Obtained Via Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) In Speech Recognition”, Eurospeech 2001
[24] K. Fukunaga, “Introduction to statistical Pattern Recognition”, E.2nd, Academic Press, 1990
[25] B. Flury, “A First Course in Multivariate Statistics”, Springer, 1997
[26]B-H Juang, Wu Chou, and C-H Lee, “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Trans. on Speech and Audio Processing, Vol 5,No 3,May 1997
[27]J-W. Hung, L.S. Lee, “Data-Driven Temporal Filters for RobustFeatures in Speech Recognition Obtained Via Minimum Classification Error (MCE)”, ICASSP 2002
[28]Thomas M. Cover, Joy A. Thomas, Elements of Information Theory
,Wiley, NewYork, NY, 1997.
[29] N-C Wang, J-W. Hung, and L.S. Lee, “Data-Driven Temporal Filters Based on Multi-Eigenvector for Robust Features in Speech Recognition”, ICASSP 2003
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top