跳到主要內容

臺灣博碩士論文加值系統

(3.90.139.113) 您好!臺灣時間:2022/01/16 17:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林秉正
研究生(外文):Ping-Cheng Lin
論文名稱:使用適應性區間模型於語者說話速度之調整
論文名稱(外文):Adaptive Duration Modeling for Speaker Adaptation of Speaking Rate
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Jen-Tzung Chien
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:70
中文關鍵詞:語音辨識語者調適說話速度區間模型
外文關鍵詞:duration modelspeech recognitionspeaker adaptationspeaking rate
相關次數:
  • 被引用被引用:4
  • 點閱點閱:315
  • 評分評分:
  • 下載下載:52
  • 收藏至我的研究室書目清單書目收藏:0
  說話速度的變化是影響語音辨識效果的重要關鍵之一,訓練和測試語料之間說話速度的不匹配,導致了辨識效能的下降。針對說話速度,不但不同語者有不同的說話速度,就算是同一個語者本身,即使唸同一個語句,也可能因為情緒或是健康情況的不同,而導致語音特性上的不同,尤其是說話速度上的差別。

  我們用區間模型來描述說話速度特性的方法,利用維特比演算法(Viterbi Algorithm)來切割適當的音框到各個狀態之中,再將每一個狀態所收集到的音框數,對每個狀態訓練一組區間模型,在辨識時將隱藏式馬可夫模型(HMM)延伸考慮區間模型,除了比對HMM的模型參數之外,區間模型的模型參數也需要一併考慮。

  本論文提出了利用最大事後機率調整法(Maximum a Posterior, MAP)調整區間模型,調適的目的是為了讓系統可以經由少數的調整語料,調整區間模型參數以提高辨識率的方法,在模型層結合了事前機率和調整語料,針對語者的說話速度做調整,有效的改善系統效能。由最後的實驗結果得知,調整過後的區間模型,可以有效的描述不同速度的語音,使得系統即使在測試快速語料時,也能維持一定的辨識水準。
  Speaking rate is one of the mismatches between training and testing environments. Even though the same user speaks the same utterance, the speech signal especially speaking rate changes because of the emotion or other factors. Most speech recognition performance is degraded when speaking rate is faster or slower than normal condition.

  Speaker adaptation is an important technique which improves the speech recognition performance. MAP adaptation combines prior probability and few adaptation data to adapt model parameters. Duration model is feasible to describe the property of speaking rate. The recognition estimates both the HMM and duration model parameters during training. In adaptation phase, we apply MAP theory to adapt HMM and duration model parameters together.

  This paper presents a new method to adapt model duration parameters. The MAP adaptation technique here is aimed at dealing with the problem of changing speaking rate. From the experiments, the recognition performance is significantly improved by adapting the duration model parameters. The adapted models are more robust when recognizing the utterance in fast speaking rate.
中文摘要 2
Abstract 6
章節目次 8
圖目錄 10
表目錄 12
表目錄 12

第一章 緒論 13
  1.1 前言 13
  1.2 研究動機與目的 15
  1.3 研究方法簡介 16
  1.4 章節概要 17

第二章 文獻探討 19
  2.1 前言 19
  2.2 說話速度分析 19
  2.3 區間模型研究 23

第三章 中文語音辨識、語者調適與說話速度 30
  3.1 中文語音辨識與隱藏式馬可夫模型(HMM) 30
  3.2 語者調適方法 32
  3.3 說話速度與語音辨識 37

第四章 區間模型調適方法 40
  4.1 區間模型 40
  4.2 Viterbi演算法 41
  4.3 EM演算法 43
  4.4 以高斯分佈表示區間模型的參數推導 46
  4.5 以加瑪分佈表示區間模型的參數推導 48
  4.6 以高斯分佈表示區間模型之MAP調適  49
  4.7 以加瑪分佈表示區間模型之MAP調適 51

第五章 實驗 54
  5.1 實驗環境設定 54
  5.2 實驗結果 57
  5.3 實驗討論 64
  5.4 系統展示 65

第六章 結論與未來研究方向 68
  6.1 結論  68
  6.2 未來研究方向 69

參考文獻 71
[1]擁抱未來, Bill Gates著, 王美英譯, 遠流出版社, 1995

[2]A. Anastasakos, R. Schwartz and H. Shu," Duration Modeling in Large Vocabulary Speech Recognition", Proc. ICASSP 1995.

[3]M. Abromovitz and J.A. Stegun, Handbook of Mathematical Functions. New York: Dover Publications, Inc., 1965.

[4]A. Bonafonte, J. Vidal, A. Nogueiras, " Duration Modeling With Expanded HMM Applied To Speech Recognition", Proc. ICSLP 1996.

[5]R. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley, New York, 1973.

[6]T. Fabian, T. Pfau and G. Ruske, " Analysis of N-Best output hypotheses for fast speech in large vocabulary continuous speech recognition", Proc. Eurospeech 2001.

[7]R. Faltlhauser, T. Pfau and G. Ruske, "On-line Speaking Rate Estimation Using Gaussian Mixture Models", Proc. ICASSP 2000, IEEE, Vol. 3, S. 1355-1358.

[8]H. Kuwabara, "Acoustic and perceptual properties of phonemes in continuous speech as a function of speaking rate", Proc. Eurospeech 1997.

[9]H. Kuwabara, " Acoustic Properties of Phonemes in Continuous Speech for Different Speaking Rate", Proc. ICSLP 1996.

[10]W.H. Lai and S.H. Chen, "A novel syllable duration modeling approach for Mandarin speech", Proc. ICASSP 2001.

[11]W.H. Lai and S.H. Chen, "Analysis of syllable duration models for Mandarin speech", Proc. ICASSP 2002.

[12]L.L. Lapin, Modern Engineering Statistics, Duxbury Press, 1997

[13]C.-H. Lee and J.-L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, Proc. ICCASSP 1993, Vol.2, 558-561.

[14]C.-H. Lee, C.H. Lin and B.H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”, IEEE Trans. Acous., Speech, Signal Proc., Vol.39, pp.806-814,1991.

[15]C.-H. Lee and J.-L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, Proc. ICCASSP 1993, Vol.2, 558-561

[16]C.H. Lee and Q. Huo, "On Adaptive Decision Rules and Decision Parameter Adaptation for Automatic Speech Recognition," (invited paper), Proceedings of the IEEE, Vol. 88, No. 8, pp.1241-1269, 2000.

[17]C.J. Leggetter, P.C. Woodland." Speaker Adaptation of HMM's Using Linear Regression". Cambridge University, Technical Report, June 1994.

[18]C.J. Leggetter and P.C. Woodland, "Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression", Proceedings of the Spoken Language System Technology Workshop, Jan 1995, pp. 110-115

[19]C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language 1995, P.171-P.185

[20]N. Mirghafori, E. Fosler and Nelson Morgan, " Towards Robustness To Fast Speech In ASR", Proc. ICASSP 1996.

[21]S. Mohammad Ahadi-Sarkani, "Bayesian and Predictive Techniques for Speaker Adaptation", Ph.D. Thesis, Cambridge University, U.K., 1996

[22]H. Nanjo, K. Kato, and T. Kawahara, "Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition", Proc. Eurospeech 2001, pp.2531--2534

[23]T. Pfau, G. Ruske, "Creating Hidden Markov Models for Fast Speech", Proc. ICSLP 1998, pp. 205-208.

[24]T. Pfau, G. Ruske, "Estimating The Speaking Rate By Vowel Detection", ICASSP 1998.

[25]K. Power, " Durational Modelling For Improved Connected Digit Recognition " , Proc. ICSLP 1996.

[26]L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.

[27]M. Richardson, M. Hwang, A. Acero, X.D. Huang, "Improvements on Speech Recognition for Fast Talkers", Proc. Eurospeech 1999.

[28]R.L. Scheaffer ,Introduction to Probability and Its Applications, PWS Publishing 1995.

[29]A. Tuerk and S. Young, "Modelling Speaking Rate Using a Between Frame Distance Metric", Proc. Eurospeech 1999, Vol. 1, pp. 419-422

[30]Jan P. Verhasselt and Jean-Pierre Martens , "A Fast And Reliable Rate Of Speech Detector", Proc. ICSLP 1996 , pp. 2258--2261

[31]H.C. Wang, "MAT-a project to collect Mandarin speech data through telephone networks in Taiwan", Computational linguistic and Chinese language Processing, vol2, no.1,pp.73-90,1997.

[32]S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK Book (Version 2.0). ECRL, 1995.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top