跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/08/01 11:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:何冠宏
研究生(外文):Guan-HongHe
論文名稱:基於加強式跨字參考模板之語者獨立孤立詞語音辨識之低成本嵌入式系統設計
論文名稱(外文):Speaker-Independent Isolated Word Recognition Based on Enhanced Cross-Words Reference Templates for Low Cost Embedded System Design
指導教授:王駿發
指導教授(外文):Jhing-Fa Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:47
中文關鍵詞:參考模板動態時間校準語者獨立孤立詞語音辨識
外文關鍵詞:reference templatesdynamic time warpingspeaker independentisolated word recognition
相關次數:
  • 被引用被引用:0
  • 點閱點閱:131
  • 評分評分:
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:0
本篇論文提出了新穎的加強式跨字參考模板,並將其應用到語者獨立孤立詞的語音辨識系統。加強式的跨字參考模板是由一群模板中所產生出來的。其主要的產生步驟有兩個,分別是動態時間校準(Dynamic Time Warping)的配對,還有算術平均。因為每個模板之間的長度都不盡相同,所以並不能直接平均。在使用平均運算之前,必須先讓每個模板的長度一致。所以我們就採用了動態時間校準來解決這個問題。在配對完成之後,就可以進行平均運算,產生一個加強式的跨字參考模板。
軟體實現的實驗結果指出本篇論文所提出的系統使用線性預測倒頻譜係數(Linear Prediction Cepstral Coefficients)作為特徵,可以在30個命令語句的環境下,可高達到98.83%的辨識率。而使用跨字參考模板以及一般的參考模板所得到的辨識率分別為97.58%和93.58%。使用加強式跨字參考模板的辨識率明顯高於使用其他兩種模板。此外,硬體實現的實驗結果指出不論是內部測試或外部測試,平均辨識率都高於90%。這樣的實驗結果證實了我們所提出之想法的有效性。
In this study, the novel enhanced cross-words reference templates (ECWRTs) are proposed and applied to speaker-independent isolated word recognition. The ECWRT is a reference template generated from a set of templates. The main procedures of ECWRT generation are DTW-matching and average operations. Due to the variance of lengths of templates, the average operations cannot perform directly. To solve this problem, dynamic time warping (DTW) is used. After DTW-matching, the matched frames of templates are averaged to form the ECWRT. The experimental results of software implementation show that the proposed system with linear prediction cepstral coefficients (LPCCs) for 30-word vocabulary can achieve an average accuracy rate of 98.83%. Such a recognition rate is higher than 97.58% and 93.58% using CWRTs and conventional reference templates, respectively. Moreover, the experimental results of hardware implementation indicate that the average recognition rates for the inside and outside test are higher than 90%. The experimental results demonstrate the effectiveness of the proposed idea.
中文摘要 I
Abstract II
誌謝 III
Content IV
Table List VI
Figure List VII
Chapter 1 Introduction 1
1.1 Background 1
1.2 Related Work 2
1.3 Motivation 5
1.4 Objectives 5
1.5 Organization 6
Chapter 2 System Overview 7
2.1 System Overview 7
2.2 Preprocessing 8
2.2.1 Voice Activity Detection 8
2.2.2 Automatic Gain Control 9
2.2.3 Framing 9
2.3 Feature Extraction 10
2.3.1 Linear Predictive Coefficients 10
2.3.2 Linear Prediction Cepstral Coefficients 13
2.4 Dynamic Time Warping 14
Chapter 3 ECWRT: Enhanced Cross-Words Reference Template 17
3.1 Overview of Data Training 17
3.2 Template Preparation 18
3.3 Reference Template 18
3.4 ECWRT Generation 19
Chapter 4 Low Cost Embedded System Design 23
4.1 Introduction to GPCE063A 23
4.2 System Overview of the Low Cost Embedded System Design 25
4.3 Fixed Point Design for the Low Cost Embedded System 26
4.3.1 Introduction to Fixed Point Design 26
4.3.2 Fixed Point Design for Feature Extraction 30
4.3.3 Fixed Point Design for Dynamic Time Warping 32
Chapter 5 Experimental results 33
5.1 Introduction to Experimental Environment 33
5.1.1 Training Database 33
5.1.2 Hardware Platform 34
5.2 Experimental Results of Software Implementation 36
5.3 Experimental Results of Hardware Implementation 40
5.3.1 Performance Evaluation of Proposed System on GPCE063A 40
5.3.2 Average Recognition Rates 42
Chapter 6 Conclusions and Future Works 44
6.1 Conclusions 44
6.2 Future Works 44
References 45
作者簡介 47


[1]B. Liu, “Research and implementation of the speech recognition technology based on DSP, in Proc. 2nd Int. Conf. Artificial Intelligence, Management Science and Electronic Commerce, Zhengzhou, China, 2011, Aug. 8-10, pp. 4188-4191.
[2]Q. Qu, and L. Li, “Realization of embedded speech recognition module based on STM32, in Proc. 11th IEEE Int. Symposium on Communications and Information Technologies, Hangzhou, China, 2011, Oct. 12-14, pp. 73-77.
[3]S. Phadke, R. Limaye, S. Verma, and K. Subramanian, “On design and implementation of an embedded automatic speech recognition system, in Proc. 17th Int. Conf. VLSI Design, Mumbai, India, 2004, Jan. 5-9, pp. 127-132.
[4]J. Zhang, “Research of improved DTW algorithm in embedded speech recognition system, in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian, China, 2010, Aug. 12-15, pp. 73-75.
[5]C. Wan, and L. Liu, “Research and improvement on embedded system application of DTW-based speech recognition, in Proc. 2nd Int. Conf. Anti-counterfeiting, Security and Identification, Guiyang, China, 2008, Aug. 20-23, pp. 401-404.
[6]T. Nomura, and R. Nakatsu, “Speaker-independent isolated word recognition for telephone voice using phoneme-like templates, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2687-2690
[7]L. R. Rabiner, and J. G. Wilpon, “Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 6, pp. 583-587, Dec. 1979.
[8]S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 52-59, Dec. 1986.
[9]M. Hoshimi, M. Miyata, S. Hiraoka, and K. Niyada, “Speaker independent speech recognition method using training speech from a small number of speakers, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, San Francisco, California, USA, 1992, Mar. 23-26, pp. 469-472
[10]A. Mokeddem, H. Hugli, and F. Pellandini, “New clustering algorithms applied to speaker independent isolated word recognition, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2691-2694
[11]M. A. Rashwan, and M. M. Fahmy, “A new technique for speaker-independent isolated-word recognition, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, New York, USA, 1988, Apr. 11-14, pp. 195-198
[12]H. S. Hinton, and L. J. Siegel, “Speaker independent isolated word automatic speech recognition using computer generated phonemes, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 727-730
[13]S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 134-141, Apr. 1979.
[14]W. H. Abdulla, D. Chow, and G. Sin, “Cross-words reference template for DTW-based speech recognition, in Proc. IEEE Region 10 Conf. Convergent Technologies for the Asia-Pacific, Bangalore, India, 2003, Oct. 15-17, pp. 1576-1579.
[15]L. R. Rabiner, “On creating reference templates for speaker independent recognition of isolated words, IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 26, no. 1, pp. 34-42, Feb. 1978
[16]Y. Matsuura, H. Miyazawa, and T. E. Skinner, “Word recognition using a neural network and a phonetically based DTW , in Proc. IEEE Workshop, Neural Networks for Signal Processing, Ermioni, Greece, Sep. 6-8, pp. 329-334
[17]C. Levy, G. Linares, P. Nocera, and J. F. Bonastre, “Reducing computational and memory cost for cellular phone embedded speech recognition system, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May 17-21, pp. 309-312
[18]D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on 8-bit MCU core, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May 17-21, pp. 301-304
[19]B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the use of filter bank features for isolated word recognition, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 1061-1064
[20]N. S. Nehe, and R. S. Holambe, “Isolated word recognition using normalized teager energy cepstral features, in Proc. Int. Conf. Advances in Computing, Control, and Telecommunication Technologies, Bangalore, India, 2009, Dec. 28-29, pp. 106-110
[21]D. Guerchi, “Embedded reference memory in automatic speech recognition systems, in Proc. 8th Int. Symposium on Signal Processing and it’s Applications, Sydney, Australia, 2005, Aug. 28-31, pp. 707-710
[22]L. Rabiner, and B. Juang, Fundamentals of speech recognition: Upper Saddle River, NJ: Prentice-hall, 1993.
[23]A. N. Sloss, D. Symes, and C. Wright: ARM System Developer’s Guide: Designing and Optimizing System Software.
[24]C. H. SU, J. R. Jang, “Speech recognition on 32-bit fixed-point processors: implementation & discussions, Master’s Thesis, National Tsing Hua University, Hsinchu City, Taiwan, 2005

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top