跳到主要內容

臺灣博碩士論文加值系統

(3.81.172.77) 您好!臺灣時間:2022/01/21 18:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王聰智
研究生(外文):Tsung-Chih Wang
論文名稱:可模組化擴充之馬可夫模型連續語音辨識IP設計
論文名稱(外文):Design of Modular Scalable HMM-based Continuous Speech Recognition IP
指導教授:周哲民
指導教授(外文):Jer-Min Jou
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:115
中文關鍵詞:錯誤控制修正碼語音辨識迴旋積解碼器
外文關鍵詞:speech recognitionerror control codeconvolution code
相關次數:
  • 被引用被引用:0
  • 點閱點閱:353
  • 評分評分:
  • 下載下載:64
  • 收藏至我的研究室書目清單書目收藏:0
  在本論文中介紹一個具有模組化擴充能力之IP的設計方法。而IP本身具備以下三種主要功能:(i) 以馬可夫模型為辨識核心之連續語音辨識;(ii) 錯誤控制修正碼之迴旋積解碼器;(iii) 可模組化擴充等功能。由於以馬可夫模型為辨識核心之連續語音辨識和迴旋積解碼器兩者的計算核心極為相似,因此我們以共用電路的設計方式將兩種應用所需要的電路整合在同一IP上。另外,為了能滿足各種語音辨識應用上的需求,我們發展出具有可模組化擴充能力的IP架構。吾人可直接將多個IP連接即完成可辨識字數的擴充,並不需要再修改IP程式碼。
  在IP硬體電路的設計方法上,採用管線化及平行化的設計方式,以降低時間延遲同時提高整體的效能。在存活記憶體管理的設計方面,將存活記憶體分割成四個區塊,以類似管線化的方式,同時對記憶體進行存活資料寫入、反溯讀取及解碼讀取等三種尋找存活路徑的動作;毫無疑問的,對於記憶體存取效率的提昇有很大的助益。
  在整體規畫及設計的考量後,採用軟硬體共同設計的方式來實現語音辨識系統及錯誤控制修正碼系統。將語音資料的前處理運算及編碼序列的產生都交由軟體負責,並完成視窗化的操作界面。在硬體部份則下載至FPGA驗證平台上,將IP視為完整的ASIC來使用,透過ISA BUS於軟硬體界面之間的溝通,完成IP的實際運作及驗證工作。
  This paper presented a design method of modular scalable HMM-based continuous speech recognition IP. This IP includes three major functions: (i) Hidden Markov Model based continuous speech recognition (ii) convolutional decoder of error control coding (iii) modular scalable IP design. Since the recognition kernel of HMM-based speech recognition system and the decoding kernel of convolutional coding system are similar, we integrate the two functions in one IP by working with same hardware modules. Besides, in order to satisfy the number of recognizable words requirement of most speech recognition applications, we develop the modular scalable IP architecture that one can increase the number of recognizable words by cascoding connection with speech recognition IPs and extension modules.
  In IP architecture design portion, both pipelining and parallel techniques are adopted in transition metric and path metric calculation to decrease the computation latency. In survivor memory arrangement, we divide survivor memory into four equal size blocks. Three memory access tasks about looking for the survivor path, survivor data write, tracebacking path read, and decoding path read, can be accomplished in the same time with memory management.
  In system integration portion, we adopt hardware/software co-verification method to integrate IP and software part and develop the speech recognition system and error control coding system. By communication via ISA-BUS, IP and software verification tasks can be accomplished completeness.
ABSTRACT
CONTENTS
LIST OF TABLES
LIST OF FIGURES
CHAPTER 1 INTRODUCTION …………………………………………………. 1
  1.1 Background ……………………………………………………………….1
  1.2 Motivation …………………..…………………………………………… 3
  1.3 Thesis Organization ………….…………………………………….…….. 4

CHAPTER 2 MAXIMUM LIKELIHOOD DECODING OF CONVOLUTION CODES
      ……………………………..….……………………………………. 5
  2.1 Definition of Convolutional Encoder ………………….………………… 6
  2.2 The Trellis and State Diagram …………………………..……………….. 8
  2.3 Maximum Likelihood Decoder for Convolutional Codes
    ― The Viterbi Algorithm ………………………………………………… 11
  2.4 Practical Design Considerations of Viterbi Algorithm …….…………... 14
  2.5 Good Convolutional Codes for Viterbi Decoding ………….…………... 15

CHAPTER 3 PRINCIPLES OF HIDDEN MARKOV MODEL ………..….……16
  3.1 Definition of Hidden Markov Model ………………………….……….. 18
  3.2 HMM Evaluation ―The Forward and Backward Algorithm ….………. 21
    3.2.1 The Forward Procedure ………………………………………. 22
    3.2.2 The Backward Procedure …………………………………….. 24
  3.3 HMM Decoding – The Viterbi Algorithm ……………………………... 25
  3.4 HMM Parameters Estimation –The Baum-Welch Algorithm ………….. 27
  3.5 Continuous Observation Densities in HMM …………………………… 30

CHAPTER 4 HIDDEN MARKOV MODEL-BASED CONTINUOUS
      SPEECH RECOGNITION …………………………………...… 32
  4.1 Speech Feature Extracting ……………………….……….…………….. 34
    4.1.1 Cepstrum Analysis ………………………………….….…………35
    4.1.2 Evaluation of Mel-Frequency Cepstrum Coefficient ….….………36
  4.2 Training HMM models ………………………………………….….……40
    4.2.1 HMM Training Procedure …………………………….…………40
    4.2.2 Left-to-Right Property ……..….……………………….……...…42
    4.2.3 Modified K-Means Algorithm ……………………….…….……43
  4.3 Using HMM to Recognize ………………………….………….….…… 44
    4.3.1 Modified Viterbi Algorithm ………………………….….………45
    4.3.2 Negative Logarithms Modified Viterbi Algorithm …….…………47

CHAPTER 5 DESIGN OF AN INTEGRATED-SCALABLE IP FOR SPEECH
      RECOGNITION AND CONVOLUTION DECODING…….….. 49
  5.1 Key Feature of Speech Recognition IP …………….………..…………. 50
  5.2 Configuration Information and Parameters …………………..….….….. 52
    5.2.1 Parameters in Convolutional Code ………………...…………. 52
    5.2.2 Parameters in Continuous Speech Recognition …...………….. 53
  5.3 Architecture Overview ………………………………………..…….….. 55
  5.4 Modular Scalable IP Design ……………………………….…..……….. 56
  5.5 Circuit Design of Pipelined Transition Metric Unit ………….….……. 58
    5.5.1 Transition Metric Derivation ……………………..……..……. 58
    5.5.2 Analysis of Fixed-Point Hardware Design …………..…..…… 60
    5.5.3 PLTMU Architecture Design ………………..…….…….…… 62
  5.6 Circuit Design of Hard-Decision Branch Metric Unit ……….….….…. 67
  5.7 Circuit Design of Parallel Add-Compare-Select Unit ……….….….…... 68
  5.8 Circuit Design of Survivor Memory Unit …………………………….. 71
    5.8.1 Register Exchange Algorithm ……………..…………………. 71
    5.8.2 Traceback Algorithm …………………………....……………..72
    5.8.3 The Traceback Strategy in Our IP Design ………...….………. 75

CHAPTER 6 PROTOTYPING SYSTEM AND EXPERIMENT RESULTS……81
  6.1 Method of Building a Prototype System ……...………………….…..… 82
  6.2 The Rapid Hardware Prototyping System ….…...……………...………. 84
  6.3 Design of a Convolutional Coding System ..……...………………..… 85
  6.4 Design of a Continuous Speech Recognition System …….…...….….… 90
  6.5 Synthesis Result ……...……………………………..….………..………97
  6.6 Simulation Results ………..……………...……………...…………….. 98
    6.6.1 Simulation Results of Our Convolutional Coding System ………99
    6.6.2 Simulation Results of Speech Recognition System ……………100
  6.7 Experimental Results ……...……………...…….…….……………….. 102
    6.7.1 Bit Error Rate Analysis ……...………………….…...…………102
    6.7.2 Recognition Rate Measurement ……...……………...…………104
    6.7.3 Speech Recognition Rate Analysis ………………...……..……105
    6.7.4 Execution Time Analysis of Speech Recognition System .….…107

CHAPTER 7 CONCLUSION …………………...……………….…………..…. 111
REFERENCE ………………………………………………….…………..…….. 113
[1]D. Jurafsky and J. H. Martin, “Speech and Language Processing”, Prentice-Hall, 2000.

[2]J.-M. Van Thong, P.J. Moreno, B. Logan, B. Fidler, K. Maffey, M. Moores, “Speechbot: an experimental speech-based search engine for multimedia content on the web,” in IEEE Trans. Multimedia, Vol. 4, Issue 1, pp.88-96, March 2002.

[3]A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm”, in IEEE Trans Information Theory, IT-13, April 1967.

[4]G. D. Forney Jr, “Maximum-Likelihood Sequence Detection in the Presence of Intersymbol Interference,” in IEEE Trans. Information Theory, IT-18, pp.363-378, May 1972.

[5]A. J. Viterbi and J. K. Omura, “Principles of Digital Communication and Coding,” McGraw-Hill Book Company, New York, NY, 1979.

[6]S. Verdu, “Maximum Likelihood Sequence Detection for Intersymbol Interference Channels: A New Upper Bound on Error Probability,” in IEEE Trans. Information Theory, IT-33, NO.1, JANUARY 1987.

[7]E. A. Lee and D. G.. Messerschmitt, “Digital Communication,” 2nd ed., Kluwer Academic Publishers, 1994.

[10]J. P. Odenwalder, “Optimum Decoding of Convolutional Codes,” ph.D. thesis, University of California at Los Angeles, 1970.

[11]K. J. Larsen, “Short Convolutional Codes with Maximum Free Distance for Rates 1/2, 1/3, and 1/4,” in IEEE Trans. Information Theory, IT-19, pp.371-372, May 1973.

[12] E. Paaske, “Short Binary Convolutional Codes with Maximal Free Distance for Rates 2/3 and 3/4, “in IEEE Trans. Information Theory, IT-20, pp. 683-689, Deptember 1974.

[13]R. Johannesson and E. Paaske, “Further Results on Binary Convolutional Codes with an Optimum Distance Profile,” in IEEE Trans. Information Theory, IT-24, pp.264-268, March 1978.

[14]K.J. Hole, “New short constraint length rate (n-1)/n punctured convolutional codes for soft-decision Viterbi decoding,” in IEEE Trans. Information Theory, Vol.34, Issue.5 Part.1, pp. 1079-1081, September, 1988.

[15]P. Frenger, T. Ottosson, “Code-spread CDMA using maximum free distance low-rate convolutional codes,” in IEEE Trans. Communications, Vol. 48 Issue. 1, pp. 135–144, Jan. 2000.

[16]L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE Proceedings, pp.257-285, February 1989.

[17]L.R. Rabiner and B.H. Juang, “Fundamentals of Speech Recognition,” Prentice-Hall Signal Processing Series, ed. A.V. Oppenheim, 1993.

[18]P. Baldi, Y. Chauvin, T. Hunkapiller, and M. McClure, “Hidden markov models of biological primary sequence information,” Proc. Nat. Acad. Sci. (USA), 91(3):1059-1063, 1995.

[19]L.E. Baum and J.A. Eagon, “An Inequality with Applications to Statictical Estimation for Probabilistic functions of Markov Processes and to a Model for Ecology,” Bulletin of American Mathematical Society, 1967, 73, pp.360-363.

[20]A. Lodi, M. Toma, R. Guerrieri, “Very low complexity prompted speaker verification system based on HMM-modeling,” in IEEE Int. Conference, Acoustics, Speech, and Signal Processing, Vol. 4, pp. 3912–3915, 2002.

[21]S. Ajmera, l. A. McCowan, H. Bourlard, “Robust HMM-based speech/music segmentation,” in IEEE Int. Conference, Acoustics, Speech, and Signal Processing, Vol.1, pp. 297 –300, 2002.

[22]S. Hiroya, M. Honda, “Determination of articulatory movements from speech acoustics using an HMM-based speech production model,” in IEEE Int. Conference, Acoustics, Speech, and Signal Processing, Vol. 1, pp. 437-440, 2002.

[23]C. de Trazegnies, F. J. Miguel, C. Urdiales, A. Bandera, F. Sandoval, “Planar shape recognition based on hidden Markov models,” Electronics Letters, Vol. 37 Issue. 24, pp. 1448–1449, November 2001.

[24]M. Nakai, N. Akira, H. Shimodaira, S. Sagayama, “Substroke approach to HMM-based on-line Kanji handwriting recognition,” Sixth Int. Conference, Document Analysis and Recognition, pp. 491–495, 2001.

[25]H. Nishimura, M. Tsutsumi, “Off-line hand-written character recognition using integrated 1D HMMs based on feature extraction filters,” Sixth Int. Conference, Document Analysis and Recognition, pp. 417–421, 2001.

[26]N. Becerra Yoma, M. Villar, “Speaker verification in noise using a stochastic version of the weighted viterbi algorithm,” in IEEE Int. Speech and Audio Processing, Vol. 10, Issue. 3, pp. 158-166, 2002.

[27]M. Karnjanadecha, S. A. Zahorian, “Signal modeling for high-performance robust isolated word recognition,” in IEEE Int. Speech and Audio Processing, Vol. 9, Issue. 6, pp. 647–654, September 2001.

[28]M. Padmanabhan, M. Picheny, “Large-vocabulary speech recognition algorithms” Computer, Vol. 35, Issue. 3, pp. 42-50, March 2002.

[29]S. Forchhammer, J. Rissanen, “Coding with partially hidden Markov models,” DCC '95 Proceedings, Data Compression Conference, pp. 92-101, 1995

[30]P. L. Galindo, “The competitive forward-backward algorithm (CFB),” Fourth International Conference, Artificial Neural Networks, pp.82-85, 1995.

[31]P. Devijver, J. Kittler, “Pattern recognition–a statistical approach,” Prentice-Hall International Inc, 1982.

[32]S. B. Davis, P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans. on ASSP, pp. 257-266, Aug. 1980

[33]N. O. S. Jayant, P. Noll, “Digital Coding of Waveforms,” Prentice Hall, 1984.

[34]Sadaoki Furui, P. Noll, “Digital Speech Coding: Speech Coding, Synthesis and Recognition,” Dekker, 1992.

[35]T. Parsons. “Voice and speech processing,” McGraw-Hill, 1987.

[36]E. Dermatas, G. Kokkinakis, “Algorithm for clustering continuous density HMM by recognition error,” in IEEE Trans. Speech and Audio Processing, Vol. 4, Issue. 3, pp. 231–234, May 1996.

[37]B. Mak, E. Bocchieri, E. Barnard, “Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model,” in IEEE Workshop on Automatic Speech Recognition and Understanding, pp.339-346, 1997.

[38]W. B. Stephen, “Error Control Systems for Digital Communication and Storage,” Prentice Hall, January 1995.

[39]H. Lou, “Viterbi decoder design for the IS-95 CDMA forward link,” in Proc. Vehicular Technology Conf. –VTC, pp.1346-1350, April 1996.

[40]C. M. Rader, “Memory management in a Viterbi decoder,” in IEEE Trans. Communication, vol. 29, pp. 1399-1401, Sept 1981.

[41]R. Cyber and C. B. Shung, “Generalized trace-back techniques for survivor memory management in the Viterbi algorithm,” Proc. GLOBECOM, vol. 2, pp. 1318-1322, Dec.1990.

[42]H. A. Bustamente, “Standford telecom VLSI design of convolutional decoder,” Proc. MILCOM, vol. 1, pp. 171-178, October 1989.

[43]G. Feygin, P. G. Gulak, “Architectural tradeoffs for survivor sequence memory management in Viterbi decoders,” in IEEE Trans. Communnication, vol. 41, pp. 425-429, Mar. 1993.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top