(3.238.250.105) 您好!臺灣時間:2021/04/18 19:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃柏景
研究生(外文):Po-Ching Huang
論文名稱:最大化邊界隱藏式馬可夫模型之證據架構應用於語音辨識
論文名稱(外文):Evidence Framework for Large Margin Hidden Markov Model Based Speech Recognition
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Jen-Tzung Chien
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:77
中文關鍵詞:最大化邊界證據架構
外文關鍵詞:large marginevidence framework
相關次數:
  • 被引用被引用:0
  • 點閱點閱:150
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:58
  • 收藏至我的研究室書目清單書目收藏:0
在本論文中,我們將貝氏證據架構應用於語音辨識中新穎之最大化邊界隱藏式馬可夫模型的參數調適。而我們最主要的訴求在於藉由改善傳統最大化邊界模型的一般化能力來改進分類學習的效果。傳統上,最大化邊界分類器是基於兼顧邊界最大化與訓練錯誤最小化的原則而建構出來的,但卻未能保證該分類器在未知的測試資料中也能保持良好的效果。因此,本方法提出了以貝氏推論為基礎的證據架構應用於最大化邊界模型的訓練,其中訓練語料參數事後機率之估算是藉由針對最大化邊界模型邊緣化的方式所求出。透過適當地挑選的超參數來循序性地更新模型參數為本方法核心之精神。除此之外,本研究利用(Expectation-Maximization, EM)演算法來估測貝氏證據架構中的的最大化事後機率以及最大化證據參數。在實驗中,我們使用TIMIT語料庫進行語音模型參數之訓練,並針對模型的一般化能力與辨識能力等議題與其他方法作比較。
The Bayesian evidence framework is presented in this paper for speech recognition based on the state-of-art large margin hidden Markov model (LM-HMM). Our aim is to elevate the speech recognition performance by improving the model generalization for LM-HMM. Traditionally, a large margin classifier is built by considering the concept of margin maximization and training error minimization. The trained LM-HMM is not guaranteed to gear with good prediction for test speech. For this consideration, we develop the Bayesian approach to LM-HMM training where the posterior distribution of training data is calculated and the marginalization over LM-HMM parameters is performed. By an appropriate choice of LM-HMM hyper parameters, the proposed evidential LM-HMM (ELM-HMM) is established. The expectation-maximization (EM) algorithm is applied in the Bayesian evidence framework for finding the maximum a posteriori parameters and the maximum evidence parameters in LM-HMM framework. In the experiments on TIMIT speech database, the proposed large margin HMM obtains good model generalization and speech recognition performance.
摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
第 一 章 緒論 1
1.1 前言 1
1.2 研究動機 1
1.3 研究方法簡介 4
1.4 章節概要 5
第 二 章 相關文獻探討 7
2.1 語音辨識簡介 7
2.2 傳統鑑別式訓練法則 10
2.2.1 最小分類錯誤訓練法則 11
2.2.2 最大交互資訊訓練法則 16
2.3 最大化邊界(Large Margin)之分類器 19
2.3.1 二元分類之支持向量機(SVM) 20
2.3.2 多類別分類之支持向量機 27
2.4 貝氏模型選擇簡介 28
第 三 章 基於最大化邊界之鑑別式訓練法則 32
3.1 最大化邊界估測法則(LME) 33
3.2 柔性邊界估測法則(SME) 38
3.3 證據架構應用於支持向量機簡介 41
3.3.1 使用貝氏理論詮釋支持向量機 41
3.3.2 證據架構應用於支持向量機 43
第 四 章 最大化邊界模型之證據架構 48
4.1 前言 48
4.2 以對數相似度比值為基礎的新分離度評估量 50
4.3 證據架構應用於最大化邊界隱藏式馬可夫模型 51
4.3.1 第一層的參數推論 53
4.3.2 第二層的超參數推論 56
4.3.3 第三層的模型選擇 58
第 五 章 實驗 61
5.1 語音資料庫 61
5.2 實驗設定 64
5.2.1 語音模型設定 64
5.2.2 實驗方式與架構 65
5.2.3 實驗結果與討論 67
第 六 章 結論與未來研究方向 70
參考文獻 71
作者簡歷 77
[1]Y. Altun and T. Hofmann, “Large margin methods for label sequence learning,” in Proc. of Interspeech, pp. 993-996, 2003.
[2]L. Bahl, P. Brown, P. de Souza and R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 11, pp. 49-52, 1986.
[3]C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer Science+ Business Media, 2006.
[4]C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, no. 2, pp. 121-167, 1998.
[5]J.-K. Chen and F. K. Soong, “An N-best candidates-based discriminative training for speech recognition applications,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 206-216, 1994.
[6]W. Chou, B. H Juang and C. H. Lee, “Segmental GPD training of HMM based speech recognizer,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 473-476, 1992.
[7]W. Chou, C.-H. Lee and B.-H. Juang, “Minimum error rate training based on N-best string models,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 652-655, 1993.
[8]J.-T. Chien, "Quasi-Bayes linear regression for sequential learning of hidden Markov models," IEEE Trans. on Audio, Speech and Language Processing, vol. 10, no. 5, pp.268-278, 2002.
[9]J.-T. Chien and C.-H. Huang, "Bayesian learning of speech duration models," IEEE Trans. on Audio, Speech and Language Processing, vol. 11, no. 6, pp. 558-567, 2003.
[10]J.-T. Chien and S. Furui, "Predictive hidden Markov model selection for speech recognition," IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, pp. 377-387, 2005.
[11]J.-T. Chien, C.-H. Huang, K. Shinoda and S. Furui, “Towards optimal Bayes decision for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 45-48, 2006.
[12]J.-T. Chien, C.-H. Huang, ” Aggregate a Posteriori linear regression adaptation,” IEEE Trans. Speech Audio Processing, vol. 14, pp. 797-807, 2006.
[13]R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001.
[14]B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 40, no. 12, pp. 3043-3054, 1992.
[15]B.-H. Juang, W. Chou and C.-H. Lee, “Minimum classification error rate methods for speech recognition,” IEEE Trans. Speech and Audio Processing, vol. 5, no. 2, pp. 266–277, 1997.
[16]H. Jiang, X. Li and C. Liu, “Large margin hidden Markov models for speech recognition,’’ IEEE Trans. on Audio, Speech and Language Processing, vol. 14, no. 5, pp.1584-1595, 2006.
[17]S. Katagiri, C.-H. Lee and B.-H. Juang, “New discriminative training algorithms based on the generalizedprobabilistic descent method,” in Proc. IEEE Workshop Neural Network for Signal Processing, pp. 299-308, 1991.
[18]S. Katagiri, B.-H. Juang, and C.-H. Lee, ”Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2345-2373, 1998.
[19]J. T.-Y. Kwok, "Moderating the outputs of support vector machine classifiers," IEEE Trans. on Neural Networks, vol. 10, no. 5, pp. 1018-1031, 1999.
[20]J. T.-Y. Kwok, “The evidence framework applied to support vector machines,” IEEE Trans. on Neural Networks, vol. 11, no. 5, pp. 1162-1173, 2000.
[21]J. Keshet, S. Shalev-Shwartz, Y. Singer and D. Chazan, “A large margin algorithm for speech-to-phoneme and music-to-score alignment,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2373-2382, 2007.
[22]C.-H. Lee and B.-H. Juang, “A survey on automatic speech recognition with an illustrative example on continuous speech recognition of Mandarin”, Computational Linguistics and Chinese Language Processing, vol. 1, no.1, pp. 01-36, 1996..
[23]K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641–1648, 1988.
[24]X. Li, H. Jiang and C. Liu, “Large margin HMMs for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 5, pp. 513-516, 2005.
[25]C. Liu, H. Jiang and X. Li, “Discriminative training of CDHMMs for maximum relative separation margin,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 101-104, 2005.
[26]C. Liu, H. Jiang, L. Rigazio, “Maximum relative margin estimation of HMMs based on N-best string models for continuous speech,” in Proc. IEEE Workshop Automatic Speech Recognition and Understanding, pp. 420–425, 2005.
[27]C. Liu, H. Jiang and L. Rigazio, “Recent improvement on maximum relative margin estimation of HMMs for speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 269-272, 2006.
[28]J. Li, M. Yuan and C.-H. Lee, “Soft margin estimation of hidden Markov model parameters,” in Proc. Interspeech, pp. 2422-242, 2006.
[29]J. Li, M. Yuan and C.-H. Lee, “Approximate test risk bound minimization through soft margin estimation,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2393-2404, 2007.
[30]D. J. C. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, pp. 415-447, 1992.
[31]D. J. C. MacKay, “The Evidence Framework Applied to Classification Networks,” Neural Computation, vol. 4, no. 4, pp. 720-736, 1992.
[32]E. McDermott, T. J. Hazen, J. L. Roux, A. Nakamura and S. Katagiri, “Discriminative training for large-vocabulary speech recognition using minimum classification error,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 203-223, 2007.
[33]D. Povey and P. C. Woodland, “Minimum phone error and I-smoothing for improved discriminative training,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), pp. 105-108, 2002.
[34]D.W. Purnell and E.C. Botha, “Improved Generalization of MCE Parameter Estimation With Application to Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 4, pp. 232–239, 2002.
[35]L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[36]R. Schluter, W. Macherey, B. Muller and H. Ney, "Comparison of discriminative training criteria and optimization methods for speech recognition," Speech Communication, pp. 287-310, 2001.
[37]F. Sha and L. K. Saul, “Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 4, pp. 313-316, 2007.
[38]F. K. Soong and E. -F. Huang, “A tree-trellis based fast search for finding the N-bestsentence hypotheses in continuous speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 705-708, 1991.
[39]V. N. Vapnik, The Nature of Statistical Learning Theory, New-York: Springer-Verlag, 1995.
[40]V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
[41]P. C. Woodland, J. J. Odell, V. Valtchev and S. J. Young, “Large vocabulary continuous speech recognition using HTK,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 125-128, 1994.
[42] S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK
BOOK (Version 2.0), 1995.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔