跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2024/12/11 17:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:簡仁宗
研究生(外文):Chien, Jen-Tzung
論文名稱:電話環境下語音辨認之研究
論文名稱(外文):SPEECH RECOGNITION UNDER TELEPHONE ENVIRONMENTS
指導教授:王小川, 李錦輝
指導教授(外文):Hsiao-Chuan Wang, Chin-Hui Lee
學位類別:博士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1997
畢業學年度:85
語文別:中文
論文頁數:110
中文關鍵詞:語音辨認強健性最佳事後機率通道消除雜訊消除語者調適
外文關鍵詞:speech recogoitionrobustnessmaximum a posteriorichannel cancellationnoise reductionspeaker adaptation
相關次數:
  • 被引用被引用:2
  • 點閱點閱:297
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
當語音辨認系統應用在電話環境上時,由於訓練與測試語音環境的不
同,常導致辨認效果的衰減,在電話環境上的失真來源包括有雜訊、通道
和語者的失真,本論文提出一系列強健性演算法做三種失真來源的補償,
以提昇辨認效果。在隱藏式馬可夫模型為主的語音辨認實驗裡,本論文所
提出的方法均能成功的克服電話環境下的失真問題。 論文首先分析雜
訊效應對語音逆頻譜向量及隱藏式馬可夫模型參數的影響。由於逆頻譜向
量受雜訊的干擾會畏縮,將隱藏式馬可夫模型參數的平均值向量用最佳畏
縮因子做調整所發展出來的投射性相似測量,對雜訊的干擾具有強健性的
效果,本論文延伸此研究,進一步補償模型參數的變異數畏縮以及平均值
的調整偏差,補償的因子是由一組調整函數所獲得,實驗證明,使用本方
法的辨認率有明顯的提昇。 為了克服電話語音的通道效應,本論文發
展出一種通道效應消除法,本方法是先量化一些電話倒通道模型的逆頻譜
向量,以訓練出一組參考濾波器,而通道效應消除濾波器的逆頻譜向量就
是由這組參考濾波器的逆頻譜向量所線性組合而成的,其組合係數的求法
是根據電話語音通過參考濾波器的累積觀察機率所求得,此方法可以有效
消除電話語音的通道效應。其次,本論文提出兩種轉換式的調整方法以調
整隱藏式馬可夫模型參數,使調整過的模型參數能夠較接近於測試時的電
話環境,這兩種轉換式調整法分別是偏差轉換及仿射轉換,我們使用有考
慮事前統計特性的最佳事後機率法則做轉換參數的估測,在我們的實驗評
估裡發現,使用最佳事後機率法則的調整方法比使用最佳相似法則的效果
好,而且仿射轉換的精確度優於偏差轉換。此外,本論文也提出一種音相
關通道補償法,此方法是利用一些調整語句將原始的隱藏式馬可夫模型參
數調整到新的通道環境下使用,調整的方法是將模型參數結合上其對應的
音相關 通道補償向量,為了改善調整效果,我們提出兩種延伸技術,
第一種是利用向量量化法將補償向量的精確度提高,第二種是利用外差法
將補償向量做線性外差,這兩種技術都已成功的應用在電話語音辨認及語
者調適上。 另外,我們也提出一種混合式演算法,將非特定語者之隱
藏式馬可夫模型調整到新的語者特性上,本方法是結合三種調整技術,首
先,將不同群組的模型參數用相對應的轉換函數做轉換,然後將轉換過的
模型參數做最佳事後機率調整,最後,在最佳事後機率調整裡未調整到的
模型參數用轉移向量外差法做進一步的調整,實驗發現使用本方法可以同
時達到這三種調整技術的優點,在不同長度的調整語句下都比其他調整方
法的效果好。
When the speech recognition system is operated under
telephone networks, the acoustic mismatch between training and
testing environments always causes the performance degradation.
The mismatch sources in telephone environments areattributed to
the ambient noise, the channel effect and the variation among
speakers. This dissertation describes a number of robust
algorithms which improve the recognition performance by
compensating these three mismatch factors. In the experiments of
hidden Markov model (HMM) based speech recognition, the proposed
methods can successfully overcome the mismatch problems in
telephone environments. The noise effect on speech cepstral
vector and its associated HMM acoustic parameters is first
investigated. Due to the shrinkage of cepstral vector in noisy
environment, the projection-based likelihood measure which uses
an optimalequalization factor for adapting the cepstral mean
vector of HMM parameters is robust to noise contamination. This
dissertation extends this measure by further compensating the
shrinkage of covariance matrix and the bias of mean vector. The
compensation factors are obtained from a set of adaptation
functions. Using this method, the recognition accuracy can be
remarkably improved. To overcome the channel effect in
telephone speech, a channel-effect-cancellation method is
developed. This approach is to estimate a channel-effect-
cancellation filter by the convex combination of several
reference filters. The reference filters, represented in
cepstrum, are generated by clustering the cepstra of inverse
telephone channels. The convex combination coefficients are
calculated by the accumulated observation probabilities when the
testing utterance passes through the reference filters. Using
this method,the channel effect can be mostly canceled. Next,
this dissertation presents two transformation-based adaptation
approaches for adapting the HMM parameters so that the adapted
HMM parameters are acoustically close to the telephone
environment. The bias and the affine transformations are
examined. We apply the maximum a posteriori (MAP) estimation
technique which incorporates the prior knowledge into the
transformation for estimating the transformation parameters.In
our evaluation, the transformation-based adaptation using the
MAP estimationoutperforms that using the maximum likelihood (ML)
estimation. The affine transformation is also demonstrated to be
superior to the bias transformation. Furthermore, a phone-
dependent channel compensation (PDCC) technique is proposed for
adapting the HMM parameters to a new channel environment by
using some adaptation data. The adaptation of HMM parameters is
completed by incorporating the corresponding PDCC vectors. To
improve the performance, two extended PDCC techniques are
presented. One is based on the refinement of PDCC using vector
quantization. The other is based on the interpolation of
compensation vectors. This method is carried out and shown to be
effective in telephone speech recognition as well as speaker
adaptation. In addition, we also propose a hybrid algorithm
for adapting the HMM parameters to a new speaker. This algorithm
is constructed by iteratively and alternately combining three
adaptation techniques. First, the clusters of HMM parameters are
locally transformed through a group of transformation functions.
Then, the transformed HMM parameters are globally smoothed via
the MAP adaptation. Within the MAP adaptation, the parameters of
unseen units in adaptation data are further adapted by applying
the transfer vector interpolation scheme. Using this algorithm,
the advantages of these three adaptation techniques can be
simultaneously captured. The resulting performance is
consistently better than other methods for almost any practical
amount of adaptation data.
封面
摘要
誌謝
目錄
第一章
第二章
第三章
第四章
第五章
第六章
第七章
第八章
附錄 :英文本
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊