跳到主要內容

臺灣博碩士論文加值系統

(23.20.20.52) 您好!臺灣時間:2022/01/24 19:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:周佑霖
研究生(外文):Yu-Lin Chou
論文名稱:以結合決策樹與GCVHMM為基礎之不特定語者中文連續數字語音辨識
論文名稱(外文):A Technique for Speaker Independent Automatic Speech Recognition Based on Decision Tree State Tying with GCVHMM
指導教授:林進燈林進燈引用關係周志成周志成引用關係
指導教授(外文):Chin-Teng LinChi-Cheng Jou
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機與控制工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:70
中文關鍵詞:馬可夫模型語音辨識
外文關鍵詞:HMMSpeech Recognition
相關次數:
  • 被引用被引用:1
  • 點閱點閱:206
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
本論文的主要目的是在研究連續中文語音的辨識。在連續音辨識的領域中,有很多的演算法被提出來解決連續音的辨識問題,其中有一種辨識法則,稱為one-state algorithm。本論文將研究的重點放在改良one-state algorithm的兩大問題上。第一個問題是由於在one-state algorithm進行辨識時,參考模型本身的好壞會嚴重影響one-state algorithm的辨識率。由於好的參考模型會有效的增加連續音辨識效果,因此在本論文中,提出一個主軸空間隱藏式馬可夫模型(GCVHMMs)來改善參考模型本身的辨識效果。
接下來針對連續語音的辨識來做介紹。由於一個聲音在前後音不一樣的時候,所產生的模型也會有些許的差異,這便是前後音相關﹙context dependent﹚的概念。因此為了達到連續語音辨識的目標,我們必須要針對一個聲音,各個前後音不同的情形都要建立一個模型,由此可知不僅所建立出來的模型量是如此地大,不合實際需要;而且我們所蒐集的訓練資料也很難能將所有的前後音蒐集到來做訓練,這樣便會造成沒有此前後音的模型。所以我們接下來就這個部分,以決策樹﹙Decision Tree﹚的方法來簡化運算量,並解決訓練資料量不足的問題,如此可使模型複雜度與可蒐集到的語音訓練資料間能達到一個適當的取捨平衡。
最後應用所有論文中提出的方法在中文連續數字的辨識上,實驗結果顯示出比原來的辨識系統最高可增加26.039%的辨識率。

This paper proposed a new speech recognition technique for continuous speech-independent recognition of spoken Mandarin digits. One popular tool for solving such a problem is the HMM-based one-state algorithm, which is a connected word pattern matching method. However, two problems existing in this conventional method prevent it from practical use on our target problem. One is the lack of a proper selection mechanism for robust acoustic models for speaker-independent recognition. The other is the information of intersyllable co-articulatory effect in the acoustic model is contained or not.
At first, a generalized common-vector (GCV) approach is developed based on the eigenanalysis of covariance matrix to extract an invariant feature over different speakers as well as the acoustical environment effects and the phase or temporal difference. The GCV scheme is then integrated into the conventional HMM to form the new GCV-based HMM, called GCVHMM, which is good at speaker-independent recognition.
For the second problem, context-dependent model is done in order to account for the co-articulatory effects of neighboring phones. It is important because the co-articulatory effect for continuous speech is significantly stronger than that for isolated utterances. However, there must be numerous context-dependent models generated because of modeling the variations of sounds and pronunciations. Furthermore, if the parameters in those models are all distinct, the total number of model parameters would be very huge. To solve the problems above, the decision tree state tying technique is used to reduce the number of parameter, hence reduce the computation complexity.
In our experiments on the recognition of speaker-independent continuous speech sentences, the proposed scheme is shown to increase the average recognition rate of the conventional HMM-based one-state algorithm by over 26.039% without using any grammar or lexical information.

Chapter 1 Introduction………………………………………………1
1.1 Motivation…………………………………………………………1
1.1.1 Literature Survey………………………………………………2
1.1.2 Research Objectives and Organization of Thesis………4
Chapter 2 Hidden Markov Model
2.1 General Structure of HMM………………………………………7
2.1.1 Hidden Markov Model…………………………………………7
2.1.2 The Output Probability Distribution……………………9
2.1.3 Elements of an HMM…………………………………………10
2.2 Three Basic Issues for HMMs…………………………………11
2.2.1 Issue 1: Probability Evaluation…………………………11
2.2.1.1 The Forward Procedure……………………………………11
2.2.1.2 The Backward Procedure…………………………………12
2.2.2 Issue 2: “Optimal” State Sequence……………………12
2.2.2.1 Viterbi Algorithm…………………………………………13
2.2.3 Issue 3: Parameter Estimation……………………………14
2.2.3.1 Auxiliary Function and Reestimation Algorithm……14
2.2.3.2 Maximization of the Auxiliary Function……………15
Chapter 3 Generalized Common Vector-based HMM………………19
3.1 Introduction……………………………………………………19
3.2 Review of Common Vector Approach…………………………20
3.2.1 Common vector approach……………………………………20
3.2.2 Relationship of CVA to Eigenanalysis…………………23
3.2.2.1 Eigenanalysis………………………………………………23
3.2.2.2 Principal component analysis…………………………24
3.2.2.3 CVA by eigenanalysis……………………………………25
3.3 Generalized Common Vector (GCV)……………………………26
3.4 Generalized Common Vector-based HMM (GCVHMM)…………28
3.4.1 Structure of GCVHMM…………………………………………29
3.4.2 Reestimation algorithm for the parameters of GCVHMM…30
Chapter 4 A Hybrid Decision Tree-based State Tying with GCVHMM for Continuous Speaker-Independent Mandarin Digits Recognition………………………………………………………………34
4.1 Introduction………………………………………………………34
4.2 Context-Dependent Acoustic Model……………………………37
4.3 Parameter Tying……………………………………………………38
4.4 Introduction of Decision Tree State Tying…………………43
4.5 A Hybrid Decision Tree with GCVHMM for Continuous
Speaker-Independent Mandarin Digits Recognition………………45
4.5.1 Tied State Left Context GCVHMM System……………………45
4.5.2 Structure of Decision Tree…………………………………47
4.5.3 Question Set……………………………………………………49
4.5.4 Likelihood Computation………………………………………50
4.5.5 Node Splitting and Stop Criteria…………………………55
4.5.6 Tagging Scheme…………………………………………………56
4.5.7 Balanced Tree Structure………………………………………57
Chapter 5 Mandarin Digits Recognition Experiments……………59
5.1 Introduction………………………………………………………59
5.2 Experiments…………………………………………………………59
5.2.1 Database…………………………………………………………59
5.2.2 Experiments Results……………………………………………59
5.2.2.1 Balanced Corpora……………………………………………60
5.2.2.2 Unbalanced Corpora…………………………………………61
5.2.2.3 Balanced and Unbalanced Tree……………………………61
5.3 Summary………………………………………………………………63
Chapter 6 Conclusion…………………………………………………64
Bibliography……………………………………………………………66

[1] L. Rabiner and B. H. Juang, Fundamental of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[2] L. R. Rabiner, ²A tutorial on hidden Markov models and selected applications in speech recognition,² Proc. IEEE, vol. 77, pp. 257-286, Feb. 1989.
[3] T. K. Vintsyuk, ²Element-wise recognition of continuous speech consisting of words from a specified vocabulary,² Kibernetika (Cybernetics), vol. 7, no. 2, pp. 133-143, March-April 1971.
[4] J. S. Bridle, M. D. Brown, and R. M. Chambrlain, ²An algorithm for connected word recognition,² in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Paris, pp. 899-902, May 1982.
[5] J. S. Bridle, M. D. Brown, and R. M. Chamberlain, ²Continuous connected word recognition whole word templates,² The Radio and Electronic Engineer, vol. 53 no. 4, pp. 167-175, April 1983.
[6] H. Ney, ²The use of a one-stage dynamic programming algorithm for connected word recognition,² IEEE Trans. Acoustic, Speech, Signal Processing, vol. ASSP-32, no. 2, pp. 263-271, April 1984.
[7] C. H. Lee, and L. R. Rabiner, ²A frame-synchronous network search algorithm for connected word recognition,² IEEE Trans. Acoustic, Speech, Signal Proessing, vol. 37, no. 11, pp. 1649-1658, November 1989.
[8] D. Burshtein, ²Robust parametric modeling of durations in hidden Markov models,² IEEE Trans. Speech and Audio Processing, vol. 4, no. 3, pp. 240 -242, May 1996.
[9] S. Ramachandrula, and S. Thippur, ²Connected phoneme HMMs with implicit duration modelling for better speech recognition,² in Proceedings of 1997 International Conference on Information, Communications and Signal Processing (ICICS), vol. 2, pp. 1024-1028, 1997.
[10] P. Ramesh, and J.G. Wilpon, ²Modeling state durations in hidden Markov models for automatic speech recognition,² in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 381-384, 1992.
[11] B. Logan, and P. Moreno, ²Factorial HMMs for acoustic modeling,² in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 813-816, 1998.
[12] Z. Ghahramani, and M. Jordan, ²Factorial hidden Markov models,² Computational Cognitive Science Technical Report 9502, July 1996.
[13] M. Brand, ²Coupled hidden Markov models for modeling interacting processes,² MIT Media Lab Perceptual Computing/Learning and Common Sense Technical Report 405, June 1997.
[14] T. Hazen, ²The use of speaker correlation information for automatic speech recognition,² Ph.D. Diss., Mass. Inst. Technical., Cambridge, Jan. 1998.
[15] C. H. Lee, C. H. Lin, and B. H. Juang, ²A study on speaker adaptation of the parameters of continuous density hidden Markov models,² IEEE Trans. Signal Processing, vol. 39, pp. 806-814, 1991.
[16] Y. Zhao, ²An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition,² IEEE Trans. Speech and Audio Processing, vol. 2, no. 3, pp. 380-394, July 1994.
[17] A. Sankar, and C.H. Lee, ²A maximum-likelihood approach to stochastic matching for robust speech recognition,² IEEE Trans. Speech and Audio Processing, vol. 4, no. 3, pp. 190-202, May 1996.
[18] M. Bilginer Gülmezoğlu, Vakif Dzhafarov, Mustafa Keskin, and Ataiay Barkana, ²A novel approach to isolated word recognition,² IEEE Trains. Speech and Audio Processing, vol. 7, No. 6, pp. 620-628, Nov. 1999.
[19] M. Bilginer Gülmezoğlu, Vakif Dzhafarov, and Ataiay Barkana, ²The common vector approach and its relation to principal component analysis,² IEEE Trains. Speech and Audio Processing, vol. 9, no. 6, pp. 655-662, Nov. 2001.
[20] H. Y. Gu, C. Y. Tseng, and L. S. Lee, ²Isolated-utterance speech recognition using hidden Markov models with bounded state durations,² IEEE Trans. Signal Processing, vol. 39, no. 8, pp. 1743-1752, Aug. 1991.
[21] C. H. Edwards, and D. E. Penney, Elementary Linear Algebra, Englewood Cliffs, NJ: Prentice-Hall, 1988.
[22] L. Knockaert, ²An order-recursive algorithm for estimating pole-zero models,² IEEE Trans. Acoustic, Speech, Signal Processing, vol. ASSP-35, pp. 154-157, Feb. 1987.
[23] S. Haykin, Neural Network, A Comprehensive Foundation, Macmillan College Publishing Company, Inc., 1994, pp. 363-370.
[24] D. F. Morrison, Multivariate Statistical Methods. NY: McGraw-Hill, 1967, pp. 156-195.
[25] A. Dempster, N. Laird, and D. Rubin, ²Maximum likelihood from incomplete data via the EM algorithm,² J. Royal Statist. Soc., vol.39, pp. 1-38, 1977.
[26] B. H. Juang, ²Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains,² AT\&T Tech. J., vol. 64, no. 6, pp. 1235-1249, 1985.
[27] L. Baum, T. Petrie, G. Soules, and N. Weiss, ²A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,² Ann. Math. Statist., vol. 41, no. 1, pp. 164-171, 1970.
[28] L. R. Liporace, ²Maximum likelihood estimation for multivariate observations of Markov sources² IEEE Trans. Inform. Theorey, IT-28, pp. 729-734, September, 1982.
[29] L. Deng, M. Lennig, F. Seitz and P. Mermelstein. ²Large vocabulary word recognition using context-dependent allophonic hidden Markov models,² Computer Speech and Language, Vol. 4, No. 4, December, 1990, pp. 345-357.
[30] S. J. Young, “Large vocabulary continuous speech recognition: a review,” in Proc. IEEE Workshop on Automatic Speech Recognition, Snowbird, Utah, 3-28, 1995.
[31] C. Dugast, R. Kneser, X. Aubert, S. Ortmanns, K. Beulen, H. Ney, ²Continuous Speech Recognition Tests and Results for the NAB'94 Corpus,² Proc. ARPA Spoken Language Technology Workshop, Austin, TX, pp. 156-161, January 1995.
[32] S.J. Young, P.C. Woodland, ²The Use of State Tying in Continuous Speech Recognition,² Proc. Europ. Conf. on Speech Communication and Technology, Berlin, pp. 2203-2206, September 1993.
[33] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, The Wadsworth Statistics/Probability Series, Belmont, CA, 1984.
[34] S.J. Young, J.J. Odell, P.C. Woodland, ²Tree-Based State Tying for High Accuracy Acoustic Modeling,² Proc. ARPA Human Language Technology Workshop, Plainsboro, NJ, pp. 405-410, Morgan Kaufmann, March 1994.
[35] L. R. Bahl, P. V. de Souza, P. S. Gopalakrishnan, D. Nahamoo, and M. A. Picheny, ²Decision trees for phonological rules in continuous speech,² in Proc. Int. Conf. Acoustics, Speech, Signal Processing ’91, Toronto, ON, Canada, May 1991, pp. 185—188.
[36] M.-Y. Hwang, X. Huang, and F. Alleva, ²Predicting unseen triphones with sesones,² in Proc. Int. Conf. Acoustics, Speech, Signal Processing ’93, Minneapolis, MN, 1993, pp. 311—314.
[37] W. Reichl and W. Chou, “Decision tree state tying based on segmental clustering for acoustic modeling,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing ’98, Seattle, WA, May 1998, pp. 801—804.
[38] S. J. Young, “The general use of tying in phoneme-based HMM speech recognizers,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing ’92, San Francisco, CA, 1992, pp. 569—572.
[39] S. J. Young, J. J. Odell, and P. C. Woodland, “Tree based state tying for high accuracy modeling,” in ARPA Workshop Human Language Technology, Princeton, NJ, Mar. 1994, pp. 286—291.
[40] X. Aubert and P. Beyerlein, “A Bottom-Up Approach for Handling Unseen Triphones in Large Vocabulary Continuers Speech Recognition,” Proceedings of the Fourth International Conference on Spoken Language Processing, pp. 14-17, Philadelphia, Pennsylvania, USA, October 1996.
[41] J.J. Odell, “The Use of Context in Large Vocabulary Speech Recognition,” Ph.D. Thesis, Cambridge University, 1995.
[42] L. Deng and J. Wu, “Hierarchical Partition of the Articulatory State Space for Overlapping-feature Based Speech Recognition,” Proceedings of the Fourth International Conference on Spoken Language Processing, pp. 2266-2269, Philadelphia, Pennsylvania, USA, October 1996.
[43] L. Ariane, N. Yves and K. Roland, “Improving Decision Trees for Acoustic Modeling,” Proceedings of the Fourth International Conference on Spoken Language Processing, pp. 1053-1056, Philadelphia, Pennsylvania, USA, October 1996.
[44] K. Beulen and H. Ney, “Automatic Question Generation for Decision Tree Based State Tying,” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 805-808, Seattle, Washington, USA, May 1998.
[45] H.-W. Hon, Vocabulary-Independent Speech Recognition: The VOCIND System, Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, Pittsburg, PA, 1992.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊