跳到主要內容

臺灣博碩士論文加值系統

(54.83.119.159) 您好!臺灣時間:2022/01/17 09:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李東隆 
論文名稱:動態時間校準於聲控選單環境控制系統之應用
論文名稱(外文):The Application of Dynamic Time Warping to a Menu-Driven Environmental control System
指導教授:陶金旭
學位類別:碩士
校院名稱:國立中興大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:69
中文關鍵詞:語音辨識動態時間校準
相關次數:
  • 被引用被引用:0
  • 點閱點閱:1403
  • 評分評分:
  • 下載下載:324
  • 收藏至我的研究室書目清單書目收藏:2
在現今語音處理學科中,語音辨識是一項頗具挑戰性的研究領域。本論文發展出一套可應用於環境控制系統之語音辨識系統。為建立方便之人機介面,我們提出針對家電應用之語音辨識系統,此語音辨識系統之建立乃使用動態時間校準(Dynamic Time Warping) 演算法。
語音辨識之特徵參數可分為線性預測係數與MFCC係數兩大類。後者可獲得較佳的語音辨識率所以我們採用之。在本文中, 我們結合MFCC係數和動態時間校準演算法去模擬實驗語音辨識。實驗結果顯示在使用五個語本樣本序列下,可達96.0%的辨適率。
Speech recognition is one of the most important research topics in the speech processing . In this thesis, we develop a speech recognition system for environmental control systems. We design a human-machine interface for household appliances based on the speech recognition system using dynamic time warping (DTW) method.
Linear predictive coefficients (LPCs) and Mel frequency cepstrum coefficients (MFCCs) are two major representations of the feature vectors in speech recognition. The MFCCs algorithm out performs LPCs and is implement in the speech recognition system . In this thesis, we combine Mel frequency cepstrum coefficients with dynamic time warping method to experiment on speech recognition. Experimental results show that the system can achieve recognition rate 96.0% using five training speech pattern sequences for each class.
第一章緒論 1
1-1前言…………………………………………………………………1
1-2語音辨識系統概述…………………………………………………3
1-3環境控制系統………………………………………………………4
第二章語音特徵參數 5
2-1聲音………………………………………………………………… 5
2-1-1語音的生成……………………………………………………… 7
2-2語音特徵擷取………………………………………………………10
2-2-1 倒頻譜(cepstrum) ……………………………………………11
2-3 MFCC……………………………………………………………… 13
2-3-1信號前置處理(Preemphasis) …………………………………14
2-3-2 分框處理(windowing) ……………………………………… 15
2-3-3 頻譜分析(Spectral analysis) …………………………… 19
2-3-4濾波器分組處理(Filter bank processing) ………………20
2-3-5 對數能量的計算(Log energy computation) ………………22
2-3-6 離散餘弦轉換(Discrete cosine Transform)………………23
2-3-7能量及差異值係數(energy and Delta coefficients)……23
2-3-8 MFCC各點實驗結果圖形………………………………………24
2-4 簡化的MFCC………………………………………………………29
第三章 特徵函數的比對:動態時間校準 31
3-1時間規劃調整…………………………………………………… 33
3-2動態時間校準的基本考量---最佳路徑的搜索……………………37
3-3動態時間校準的限制………………………………………………40
3-3-1端點的限制………………………………………………………40
3-3-2單調限制…………………………………………………………41
3-3-3 局部路徑限制………………………………………………… 41
3-3-4 總體路徑限制…………………………………………………44
3-3-5加權值的限制……………………………………………………46
3-4動態時間校準的解…………………………………………………48
第四章 實驗結果 52
4-1 不同的加權值和局部路徑限制下動態時間校準的解…………54
4-2 不同的語音樣本數下動態時間校準的解………………………57
4-3 不同階層MFCC係數下動態時間校準的解………………… 58
4-4 多人聲音下的實驗結果…………………………………………59
4-5 語音0~9聲控選單環境控制系統………………………………61
4-6母音聲控選單環境控制系統……………………………………62
第五章 結論和未來發展 64
5-1 結論……………………………………………………………64
5-2 未來發展………………………………………………………64
參考文獻 65
參考文獻
[1] S. Furui, ”Cepstral analysis techniques for automatic speaker verification”, IEEE Tran. on ASSP, 29, No. 2, 254-272 (1981).
[2] B.A Dautrich, L.R.Rabiner, and T.B.Martin, ”On the Effect of Varying Filter Bank Parameter on Isolated Word Recognition”, IEEE Trans. Acoustics, Speech signal Proc., ASSP-31(4)793-803,August 1983.
[3] Claudio Becchetti, Lucio Prina Ricotti“Speech Recognition Theory and C++ Implementation”, Speech signal analysis, 3,122-143,(1999).
[4] B.A Dautrich, L.R. Rabiner, and T.B. Martin, “The Effect of Selected Signal Processing Techniques on the Performance of a Filter Bank Based Isolated Word Recognizer”, Bell System Tech.J., 62(5):1311-1336,May-June 1983.
[5] A.V. Oppenheim, R.W. Shafer, Digital Signal Processing, Prentice Hall(1989)
[6]L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall,(1993)
[7] L.R. Rabiner, R.W. Shafer, Digital Processing Of Speech Signal, Prentice Hal(1978)
[8] E. Bocchieri, J.G. Wilpon, ”Discriminative feature selection for speech recognition”, Computer Speech and Language,7,229-246(1993)
[9] Davis S.B., Mermelstein P.,“Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoustics, Speech and Signal Processing, 28, pp.357-366(1980)
[10] H. Strube, “Determination of Instant of Glottal Closure from the Speech Wave”, J. Acoust. Soc. Am, 56(5): 1625-1629, November 1974.
[11]S.Furui, “Speaker Independent Isolate Word Recognition Using Dynamic Feature of Speech Spectrum“, IEEE Trans. Acousics, Speech, Signal Proc., ASSP-34 (1)52-59,Feburuary 1986.
[12]O.Ghitza, “Auditory Nerve Representation as a Basis for Speech Processing“, IEEE Trans. Acoustics, Speech, Signal proc.,ASSP-30(2):294-304,April 1982.
[13] R.M. Gray, A. Buzo A. H. Gray, Jr., and Y. Matsuyama, “Distortion measures for speech processing”, IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-28 (4):367-376, August 1980.
[14]A.H. Gray Jr. and J.D. Markel, “Distance measures for speech processing“, IEEE Trans. Acoustic, Speech, signal Proc.,ASSP-24(5):380_391,October 1976.
[15]B.H. Juang, L.R. Rabiner, and J.G. Wilpon, “On the use of bandpass liftering in speech recognition”, IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-35 (7): 947-954,July 1987.
[16]F. Itakura and S. Saito, “A statistical method for estimation of speech spectral density and format frequencies“, Electronics and Communication in Japan, 53 A: 36-43,1970.
[17]S.S Stevens and J. Volkmann, “The relation of pitch of frequency: A revised scale“, Am, J. Psychol., 53:329-353,1940.
[18]N. Nocerino, F.K. Soong, L.R. Rabiner, and D.H. Klatt, “Comparative study of several distortion measures for speech recognition“, Speech Communication, 4:317-331,1985.
[19]N.R. French and J.C. Steinberg, “Factors governing the intelligibility of speech sounds”, J. Acoust. Soc. Am., 19:90-119,1947.
[20]S. Furui, “On the role of dynamic characteristics of speech spectra for syllable perception”, Fall Meeting of Acoust. Soc. Japan, 1-1-2: October 1984.
[21]S. Furui “Speaker Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum”, IEEE Trans. Acoustics, Speech, Signal Proc. ,ASSp-34(1):52-59, February 1986.
[22]H. Sakoe and S. Chiba, “Dynamic programming optimization for spoken word recognition, ” Acoustics, Speech, Signal Proc., ASSP-26 (1):43-49, February 1978.
.
[23] L.R. Rabiner, C.K. Pan F.K. Soong, “On the performance of isolated word speech recognizer using vector quantization and temporal energy contours“, AT&T Tech. J., 63 (7):1245-1260,1984.
[24]R.E. Bellman, Dynamic Programming, Princeton University Press, Princeton, New Jersey, USA, 1957.
[25] J.E. Shore and D.K. Burton, “Discrete utterance speech recognition without time alignment,” IEEE Trans. Information Theory, IT-29 (4):473-491,July 1983
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 陳文典(民86):STS理念下之教學策略。物理教育,1(2),85-95
2. 許淑真﹙民89﹚:墾丁國小課程統整規劃。屏縣教育季刊,2,26-28
3. 許春峰﹙民87﹚:師院普通化學實驗STS教學模組。新竹師院學報,11,157-186
4. 莊奇勳、王嘉田﹙民86﹚:國小自然科STS教學模組之探討:豆漿製作。嘉義師院國民教育研究學報,3,75-98
5. 莊奇勳﹙民86﹚:師院環境科學STS教學模組之開發研究。嘉義師院學報,11,275-308
6. 林玫君﹙民88﹚:統整課程新設計---「創造性戲劇」在語文教學的應用。班級經營,4(1),39-46
7. 張世忠﹙民87﹚:社會建構教學與科學概念。教育資料與研究,24,30-36
8. 張美玉﹙民88﹚:反省思考的教學模式在教育實習課程的應用。教育研究資訊,4﹙6﹚,88-107
9. 張世忠﹙民86﹚:建構主義與科學教學。科學教育月刊, 202,16-23
10. 高翠霞(民88):學校跨科統整課程的實施─認知與行動。國教新知,46(1),20-26
11. 洪振方(民87):科學創造力之探討。高雄師大學報,9,289-302
12. 柯禧慧、楊爵光﹙民89﹚:「課程統整」意義之釐清與實作。教師之友,41(4),25-32
13. 徐美蓮、薛秋子(民88):以建構教學編織生命教育統整課程。國教天地,134,84-90
14. 周麗玉﹙民87﹚:想想,是不是有另一種可能---九年一貫課程綱要的構想。教師天地,93,35-39
15. 林明瑞﹙民86﹚:STS模式之環境教育教學法。科學教育月刊,204,24-31