跳到主要內容

臺灣博碩士論文加值系統

(18.205.192.201) 您好!臺灣時間:2021/08/05 10:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:周至宏
研究生(外文):Zhi-Hong Zhou
論文名稱:基於ARM920T之HTK關鍵模組定點數演算法研究與實現
論文名稱(外文):Research and Implementation of Fixed-Point Algorithm for HTK Kernel Modules Based on ARM920T
指導教授:王駿發
指導教授(外文):Jhing-Fa Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:59
中文關鍵詞:語音辨識系統HTK關鍵模組嵌入式平台定點數演算法
外文關鍵詞:HTK Kernel ModulesSpeech RecognitionEmbedded SystemFixed-point
相關次數:
  • 被引用被引用:0
  • 點閱點閱:226
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:0
近年來,行動裝置功能越來越複雜,基於操作上的方便性,使得人機介面也越來越重要,從最基本的按鍵,到手寫輸入,語音輸入等等,一一被發展,對人類來說,透過語音輸入作為人機介面,將是一種非常自然的方式,就像與人溝通一樣的自然,即使還認識不多文字的小朋友,或視力退化的銀髮族,都能透過簡單的語言表達,就與機器溝通。
語音辨識系統中,隱藏式馬可夫模型(Hidden Markov Models, HMMs)是目前最常用且辨識效能極佳的辨識方式,它是以機率模型來描述一連串發音的現象,將一小段語音的發音過程,看成是多個機率模型中的連續狀態轉移;而每一段音框中所呈現的特徵,看成某個狀態下的產出結果。
HTK為劍橋大學工程學系的Speech Vision and Robotics Group發展出來的隱藏式馬可夫模型(HMM)開發工具,對建造HMM Based語音辨識系統,HTK提供完整的功能與開放的原始碼。盡管HTK功能強大,但是在語音辨識過程中,還是一定會用到極為大量的浮點數算術運算和數據處理,這通常是行動裝置在應用上的一大障礙。
因此,本論文研究主題為適合嵌入式平台的定點化數學運算演算法,並且應用於HTK關鍵模組,我們實際在ARM920T為核心的S3C2440A晶片上測試,根據實驗結果指出,在「Square root」可得到60倍的加速,「Logarithm」可得到200倍的加速,「Pre-emphasize」可得到15倍的加速,「Hamming Window」可得到102倍的加速,「Real FFT」可得到38倍的加速,「DCT」可得到900倍的加速,整體來看所提出的定點演算法可以有效提升嵌入式裝置的運算效能。
In recent years, the functions of a mobile device(Embedded System) have become more and more complicated. To facilitate the operational convenience, designing a friendly user interface is an essential issue. Various user interfaces ranged from the most basic bottom to the complicated handwriting and speech have already been developed.
Using speech as the human computer interface is the most natural way. Even for the children who have not learned much language and elderly who have visual degeneration, they are able to communicate with the machines by simple oral expressions.
Hidden Markov Models (HMMs) is a popular method to implement a speech recognition system. HMM uses probability models to describe the phenomenon of pronunciations. The pronunciation process of a speech segment can be regarded as some continuous state transitions in HMM, and the frame features can be treated as the symbols of an HMM state.
HTK was developed by the Speech Vision and Robotics Group of the Cambridge University Engineering Department and is the toolkit for building hidden Markov Models. HTK provides a full development environment and complete source codes for establishing an HMM based speech recognition system. Nevertheless, speech recognition often requires a large number of floating-point arithmetic and control operations. This makes real-time speech recognition in a mobile device a challenging problem.
Hence, this thesis addresses on “Fixed-point speech recognition algorithm which is suitable for embedded systems”. The floating-point HTK modules are transformed into fixed-point ones. The experiments were conducted on S3C2440A with ARM920T core. In our experimental results, the speed-up in “Square root” was 60 times; the speed-up in “logarithm” was 200 times; the speed-up in “Pre-emphasize” was 15 times; the speed-up in “Hamming window” was 102 times; the speed-up in “Real FFT” was 40 times; the speed-up in “DCT” was 900 times. The computational time is greatly improved the computational efficiency of embedded devices by the proposed fixed-point algorithm.
摘要 I
ABSTRACT III
致謝 V
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND AND MOTIVATION 1
1.2 ORGANIZATION OF THESIS 2
CHAPTER 2 OVERVIEW OF HIDDEN MARKOV MODEL TOOLKIT 3
2.1 INTRODUCTION OF HTK 3
2.2 HTK LIB ARCHITECTURE 3
2.3 THE TECHNOLOGY OF SPEECH RECOGNITION 4
2.3.1 Speech Feature extraction 4
2.3.2 Building Acoustical Models: 5
2.3.3 Recognition 7
2.4 SPEECH RECOGNITION VIA HTK 8
2.4.1 Feature Extraction 8
2.4.2 Training Process 12
2.4.3 Recognition Process 17
CHAPTER 3 MFCC EXTRACTION METHOD 19
3.1 OVERVIEW OF MFCC 19
3.1.1 Frame Blocking 20
3.1.2 Log-Energy 20
3.1.3 Pre-Emphasis 21
3.1.4 Hamming Window 22
3.1.5 FFT 23
3.1.6 Triangular Band-pass Filters 26
3.1.7 DCT 27
3.1.8 Delta Cepstrum Coefficients 28
3.2 FIXED POINT MFCC 29
3.2.1 Basic Theory of Fixed Point Mathematics 29
3.2.2 Log Energy 30
3.2.3 Pre-Emphasis 31
3.2.4 Hamming-window 32
3.2.5 FFT 32
3.2.6 Triangular Band-pass Filters & Log Energy 34
3.2.7 DCT 35
3.2.8 Logarithm 38
3.2.9 Euclid distance 40
CHAPTER 4 ENVIRONMENT SETTING AND TEST PLATFORM 42
4.1 INTRODUCTION TO HW AND SW PLATFORM 42
4.2 ENVIRONMENT SETTING 43
4.2.1 Argument of MFCC 43
4.2.2 Accuracy Testing Process 43
4.2.3 Performance Testing Process 45
4.2.4 Experimental Training Corpora 45
CHAPTER 5 EXPERIMENTAL RESULTS AND ANALYSIS 47
5.1 PRECISION 47
5.2 PERFORMANCE 56
CHAPTER 6 CONCLUSION AND FUTURE WORKS 57
REFERENCES 58
[01] Andersen O., Dalsgaard P. and Barry W. On the use of data-driven clustering technique for identification of poly- and mono-phonemes for four European languages. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume 1, pp. 121–124, Adelaide, Australia, Apr. 1994.
[02] Chomsky, N. and Halle, M., 1968. The Sound Pattern of English. New York: Harper & Row.
[03] C. Y. Tseng, “A phonetically oriented speech database for Mandarin Chinese,” Proc. ICPhS95, Stockholm, pp. 326- 329 (1995).
[04] C.-H. Lee, L. Rabiner et al.: 1990. Acoustic modcling for large vocabulary speech recognition. Computer speech and language: Vol. 4_ pp.127- 165.
[05] C.L. Huang, C-H Wu, “PHONE SET GENERATION BASED ON ACOUSTIC AND CONTEXTUAL ANALYSIS FOR MULTILINGUAL SPEECH RECOGNITION” Department of Computer Science and Information Engineering,National Cheng Kung University, Tainan, Taiwan, R.O.C. (2007)
[06] C.L. Huang, C-H Wu, “Generation of Phonetic Units for Mixed-Language Speech Recognition Based on Acoustic and Contextual Analysis”, Department of Computer Science and Information Engineering,National Cheng Kung University, Tainan, Taiwan, R.O.C. (2007)
[07] C. Y MA, Pascale FUNG, “Using English Phoneme Models for Chinese Speech Recognition” , The Human Language Technology Center Department of Electrical and Electronic Engineering Hong Kong University of Science and Technology (HKUST), Hong Kong
[08] F. Seide. N. J. C. Wang, 1998. Phonetic modeling in the Philips Chinese continuous-speech recognition system. In Proc.
[09] Harju M., Salmela P., Lepp¨anen J., Viikki O. and Saarinen J. Comparing parameter tying techniques for multilingual acoustic modelling. In Proceedings of the European Conference of Speech Communication and Technology, pp. 2729–2732, Aalborg, Denmark, Sept. 2001.
[10] H. C. Wang, “MAT – A project to collect Mandarin speech data through telephone networks,” Computational Linguistics and Chinese Language Processing, vol.2, no. 1, pp. 73-90 (1997).
[11] Imperl B. and Horvat B. The clustering algorithm for the definition of multilingual set of context dependent speech models. In Proceedings of the European Conference of Speech Communication and Technology, pp. 887–890, Budabest, Hungary, 1999. 45
[12] J. L. Gauvain, L.F. Laniel, G Adda, M. Adda-Decker, 1994. Speaker Independent Continuous Speech Dictation, Speech Communication, Vol. 15 (l-2), pp. 21-37.
[13] K¨ohler J. Comparing three methods to create multilingual phone models for vocabulary independent speech recognition tasks. In Proc. ESCA-NATO Tutorial and Research Workshop: Multi-lingual Interoperability in Speech Technology, pp. 79–84, Sept. 1999.
[14] K¨ohler J. Multilingual phone models for vocabulary-independent speech recognition tasks. Speech Communication, 35(1-2):21–30, Aug. 2001.
[15] Karjalainen M. Kommunikaatioakustiikka. Technical Report 51, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, Espoo, Finland, 1999. Preprint, In Finnish.
[16] Ladefoged P., Local J. and Shockey L., editors. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, U.K., 1999.
[17] Rabiner L. Fundamentals of Speech Recognition. PTR Prentice-Hall Inc., New Jersey, 1993.
[18] Shengmin Yu Sheng Hu Shuwu Zhang Bo Xu, “CHINESE-ENGLISH BILINGUAL SPEECH RECOGNITION”, Hi-Tech Innovation Center, Institute of Automation Chinese Academy of Sciences, Beijing, P. R. China (2003)
[19] Turunen E. Survey of theory and applications of Lukasiewicz-Pavelka fuzzy logic. In di Nola A. and Gerla G., editors, Lectures on Soft Computing and Fuzzy Logic. Advances in Soft Computing, pp. 313–337. Physica-Verlag, Heidelberg, 2001.
[20] Vihola M., Harju M., Salmela P., Suontausta J. and Savela J. Two dissimilarity measures for HMMs and their application in phoneme model clustering. Accepted to Proceedings of International Conference on Acoustics, Speech and Signal Processing, Orlando, USA, 2002.
[21] Y. J. Chen, C-H. Wu et al.: 2002. Generation of robust phonetic set and decision tree for Mandarin using chi-square testing. Speech Communication, Vol. 38 (3-4), pp. 349-364.
[22] Young, S. et al. HTKbook(V3.2), Cambridge University Engineering Dept. (2002) [23] Zgank A., Imperl B. and Johansen F. Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering. In Proceedings of the European Conference of Speech Communication and Technology, pp. 2725–2728, Aalborg, Denmark, Sept. 2001.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top