跳到主要內容

臺灣博碩士論文加值系統

(18.204.48.64) 您好!臺灣時間:2021/08/03 12:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:謝宗學
研究生(外文):Tsung-Hsueh Hsieh
論文名稱:加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識
論文名稱(外文):Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments
指導教授:洪志偉洪志偉引用關係
指導教授(外文):Jeih-weih Hung
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:93
中文關鍵詞:自動語音辨識虛擬雙通道碼簿倒頻譜統計補償法線性最小平方回歸法二次最小平方回歸法
外文關鍵詞:automatic speech recognitionpseudo stereo codebookscepstral statistics compensationlinear least squares regressionquadratic least squares regression
相關次數:
  • 被引用被引用:3
  • 點閱點閱:144
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
自動語音辨識研究上,該如何有效地降低背景雜訊的影響一直是研究重點,而這類的研究在語音辨識研究歷史上也已有相當多的改善方法被提出來,如整段式倒頻譜平均與變異數正規化法(U-CMVN)以及分段式倒頻譜平均與變異數正規化法(S-CMVN)即屬於此類。這兩種在傳統的強健性語音辨識上,常被用來降低雜訊影響的特徵參數正規化法,主要是以整段語句或片段語句為統計值正規化基礎之特徵參數等化技術,然而它們在統計值的估算上並非相當準確,且無法以線上方式去執行。
本論文中,我們建立兩組分別代表訓練語音與測試語音的碼簿來代替U-CMVN與S-CMVN這兩種以段落方式做統計值的估算,我們稱之為虛擬雙通道碼簿。以虛擬雙通道為基礎我們發展出三種特徵參數補償法:倒頻譜統計補償法(cepstral statistics compensation,CSC)、線性最小平方回歸法(linear least squares regression,LLS)與二次最小平方回歸法(quadratic least squares regression,QLS)。我們介紹藉由碼簿求得代表訓練語音與測試語音的統計值,進而執行這三種特徵參數補償法來強健語音訊號、提升辨識效果。這些方法不但簡單且實驗效果更好,並且能夠以線上的方式執行。
我們將這三種方法作用於四種不同類型的倒頻譜特徵參數上,包含梅爾頻率倒頻譜係數(MFCC)、自相關梅爾頻率倒頻譜係數(AMFCC)、線性預測倒頻譜係數(LPCC)以及感知線性預測倒頻譜係數(PLPCC)。實驗方面我們採用AURORA-2語料庫,實驗結果顯示在各種語音特徵參數中,我們提出的這三種方法在各種雜訊環境下,會更促進實驗效果的提升。此外,與傳統的U-CMVN與S-CMVN比較,這三種方法將提供更好的辨識率。
To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner.
In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable.
We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy.
誌謝............................................................................................................................... i
摘要……………………………………………………………………………………ii
Abstract.........................................................................................................................iv
目錄…………………………………………………………………………….. …....vi
圖目錄………………………………………………………………………………...ix
表目錄………………………………………………………………………...……..xiii
第一章 緒論 ………………………………..1
1-1研究動機 1
1-2 強健性語音辨識方法的介紹 2
1-3 研究方法介紹 4
1-4 章節概要 5
第二章 各種語音訊號特徵參數抽取流程的介紹 6
2-1 梅爾倒頻譜係數 (mel-frequency cepstral coefficients,MFCC) 6
2-2 自相關梅爾倒頻譜係數 (autocorrelation mel-frequency cepstral coefficients,AMFCC) 11
2-3 線性預測倒頻譜係數(linear prediction cepstral coefficients,LPCC) 16
2-4感知線性預測倒頻譜係數(perceptual linear prediction cepstral coefficients,PLPCC) 18
2-5 聲學模型之建立 20
第三章 整段式與分段式之強健性語音特徵等化技術 22
3-1 整段式倒頻譜平均值與變異數正規化法 (utterance-based cepstral mean and variance normalization,U-CMVN) 22
3-2分段式倒頻譜平均值與變異數正規化法 (segmental cepstral mean and variance normalization,SCMVN) 23
3-3 討論 24
第四章 運用虛擬雙通道碼簿為基礎之雜訊強健技術的介紹 27
4-1 虛擬雙通道碼簿之建立方法 27
4-2 倒頻譜統計補償法 (cepstral statistics compensation,CSC) 31
4-3 線性最小平方回歸法 (linear least squares regression,LLS) 與二次最小平方回歸法 (quadratic least squares regression,QLS) 33
4-3 本章結論 35
第五章 基本辨識系統之建立 36
5-1 語音資料庫簡介 36
5-2 辨識效能評估 37
5-3 辨識系統的訓練和結果 37
5-4 本章討論 41
第六章 強健式語音特徵參數技術之實驗結果 42
6-1 整段式倒頻譜平均與變異數正規化法之實驗結果 42
6-2 分段式倒頻譜平均與變異數正規化法之實驗結果 45
6-3 倒頻譜統計補償法之實驗結果 50
6-4線性最小平方回歸法之實驗結果 58
6-5 二次最小平方回歸法之實驗結果 65
6-6 綜合討論 72
第七章 結論與未來展望 90
參考文獻 91
[1] 王小川,”語音訊號處理”,全華科技圖書,2004
[2] C.J. Leggetter,P.C. Woodland,”Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”,Computer Speech and Language,1995
[3] Y.Gong, “Speech Recognition in Noisy Environments:A Survey Speech Communication 16,1995”
[4] M.J.F Gales, “Model-based Technique for Noise Robust Speech Recognition”, University of Cambridge.
[5] S.F, Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans on ASSP, Vol.27,NO.2,pp.113-120-1979
[6] P. Lockwood and J. Boudy, “Experiments with a Nonlinear Spectral Subtraction (NSS) , Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars”, Eurospeech 1991
[7] ITU-T Recommendation G.729-Annex B: A silence compression scheme for G.729 optimized for terminals conforming to Recommendation V.70
[8] A.D. Berstein and I. D. Shallom. “An hypothesized Wiener filtering approach to noisy speech recognition”. Proceedings of ICASSP, pp.913-916, 1991
[9] Tai-Hwei Hwang, Sen-Chia Chang, “Energy Contour Enhancement for Noisy Speech Recognition”, ISCSLP 2004, pp.249-252, 2004
[10] Hwang, T.-H. ”Energy Contour Extraction for In-Car Speech Recognition”, Eurospeech 2003, pp.2181-2184, 2003
[11] Furui .S, “Cepstral Analysis Technique for Automatic Speaker Verification”. IEEE Trans. Acoust. Speech Signal Process. 1981
[12] S. Tiberewala and H. Hermansky. “Multiband and Adaptation Approaches to Robust Speech Recognition”. in Proc. Eurospeech97, 1997, pp. 107-110
[13] H. Hermansky and N. Morgan, “RASTA Processing of Speech”. IEEE Trans. on Speech and Audio Processing. 2,pp.578-589, 1994
[14] J.W. Hung, J.L. Shen, L.S. Lee, “New Approaches for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques”, IEEE Trans. on Speech and Audio Processing, Nov. 2001
[15] J.L. Gauiain and C.H.Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains”, IEEE Trans. on Speech and Audio Processing, 1994
[16] C.J. Leggetter and P.C. Woodland, “Maxumum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”, Computer Speech and Language, 1995.
[17] O. Viikki and K. Laurila, “Noise Robust HMM-based Speech Recognition Using Segmental Cepstral Feature Vector Normalization”, in ESCA NATO Workshop Robust Speech Recognition Unknown Communication Channels, Pont-a-Mousson, France, 1997, pp.107-110.
[18] Benjamin J. Shannon, Kuldip K. Paliwal,”Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition”, Speech Communication(2006).
[19] Atal, B.S. “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”, Journal of the Acoustical Society of America vol.55, 1304-1312, 1974.
[20] J. Makhoul, “Spectral Linear Prediction : Properties and Applications,” IEEE, 1975.
[21] H. Hermansky, ”Perceptual linear predictive(PLP) analysis of speech”, J. Acoust. Soc. Am, vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
[22] Jeih-Weih Hung, “Cepstral statistics compensation using online pseudo stereo codebooks for robust speech recognition in additive noise environments” ICASSP 2006.
[23] Chung-fu Tai, Jeih-weih Hung, “Silence Energy Normalization for Robust Speech Recognition in Additive Noise Environments” ICSLP 2006.
[24] H. Hermansky and N. Morgan. “RASTA processing of speech”. IEEE Transactions on Speech and Audio Processing, 2,578-589, 1994
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top