跳到主要內容

臺灣博碩士論文加值系統

(35.172.136.29) 您好!臺灣時間:2021/07/29 07:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:孫大為
研究生(外文):Ta-Wei Sun
論文名稱:非文字相關語者辨識嵌入式系統之定點化設計與實現
論文名稱(外文):Fixed-Point Arithmetic Design of Embedded Text-Independent Speaker Recognition System
指導教授:王駿發雷曉方
指導教授(外文):Jhing-Fa WangSheau-Fang Lei
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:61
中文關鍵詞:嵌入式系統語者辨識支援向量機
外文關鍵詞:speaker recognitionsupport vector machinesembedded system
相關次數:
  • 被引用被引用:0
  • 點閱點閱:198
  • 評分評分:
  • 下載下載:22
  • 收藏至我的研究室書目清單書目收藏:2
語音訊號辨識的目的通常分為三種,分別為辨別出「說話人是誰」、「說話的內容」及「在哪裡說話」;語者辨識的目的主要是用以正確的辨認出「說話人是誰」,這個技術在人類生活便利性的改善上,有著深遠的影響,近年來消費性電子產品多以輕巧設計及方便系統升級為訴求,以具有低耗電及可攜性高性質的嵌入式系統實現是較佳的選擇。
本論文提出基於支援向量機(Support Vector Machines)之非文字相關語者辨識嵌入式系統,系統的核心是使用線性預測倒頻譜係數(Linear Prediction Cepstrum Coefficients)演算法來進行語音特徵的擷取,而語者模型的訓練及辨識的演算法是使用支援向量機來取代傳統的高斯混合模型(Gaussian Mixture Models)方法。
在系統的實現上,考慮到未來數位家庭生活的應用,語者模型的訓練及辨識的時間需要較快的速度,經由整體分析之後,發現最耗費時間的為語音特徵擷取及支援向量機,考量到嵌入式系統在處理浮點運算效率不高,我們針對該部份演算法作定點加速,其加速的效果在訓練時間上約10.88倍,在辨識時間上約6.51倍,而定點化所造成的誤差會影響系統的辨識率,我們設計的定點嵌入式系統以10個人的分別在二種環境下作實驗,仍有93.61%的辨識率。
In this thesis, the fixed-point arithmetic design for embedded text-independent speaker recognition system is proposed. The goal is to increase the convenience of life by portable devices. The Linear Prediction Cepstrum Coefficients (LPCC) is used for feature extraction. The Multi-Class Support Vector Machines (SVMs) algorithm is adopted for speaker model training and speaker classification.
In order to overcome the time-consuming problem in speaker model training for portable device, the modification of training and classification algorithm is essential. After analyzing the entire algorithm of the system, the processes for performing LPCC extraction and SVMs are found to have a large burden in computation. Therefore, fixed-point implementation is proposed in this thesis. The experimental results show that there is great improvement in training time. Moreover, there are 10.88 times improvement in training time, and 6.51 times improvement in classification time comparing with floating-point implementation. Although the speaker recognition accuracy is decreased due to the truncation error between fixed-point and floating-point design, the speaker recognition accuracy rate can still reach 93.61% in our proposed work.
CHAPTER 1 Introduction 1
1.1 Background 1
1.2 Related Work 2
1.3 Motivation 3
1.4 Thesis Organization 4
CHAPTER 2 System Frameworks 5
2.1 System Overview 5
2.2 End-Point Detection 6
2.3 Feature Extraction 8
2.3.1 Pre-Emphasis 8
2.3.2 Frame Blocking 9
2.3.3 Hamming Windows 9
2.3.4 Linear Predictive Cepstrum Coefficients (LPCC) 10
2.4 Support Vector Machine Classification 14
2.4.1 Introduction to Support Vector Machines (SVMs) 15
2.4.2 Sequential Minimal Optimization Algorithm for SVMs 19
2.4.3 Testing phase of Multi-class SVMs 22
CHAPTER 3 Fixed Point Design for Speaker Identification System 25
3.1 Introduction to Fixed Point Design 25
3.2 Fixed Point Design for Feature Extraction 30
3.2.1 Fixed Point Format 30
3.2.2 Truncation Error of Floating Point and Fixed Point 31
3.3 Fixed Point Design for Support Vector Machines 35
3.3.1 Implementation of Sequential Minimal Optimization 35
3.3.1.1 Fixed Point Format 37
3.3.1.2 Truncation Error of Floating Point and Fixed Point 38
3.3.2 Implementation of SVMs Classification 42
3.3.2.1 Fixed Point Format 42
3.3.2.2 Truncation Error of Floating Point and Fixed Point 43
3.4 Performance Evaluation of Fixed-Point System 45
3.4.1 Accelerating Approach for “eta” 45
3.4.2 Comparison of Floating-point System and Fixed-point System 45
CHAPTER 4 Experimental Results and Comparisons 48
4.1 Introduction to Experimental Environment 48
4.2 ARM Performance Evaluation 50
4.2.1 Introduction to Advance RISC Machine (ARM) Embedded System 50
4.2.2 Experimental Results 52
4.2.2.1 SMD Database Evaluation Results 53
4.2.2.2 FMMD Database Evaluation Results 54
CHAPTER 5 Conclusion and Future Work 57
References 58
[1]D. Reynold and R.C. Rose, “Robust Text Independent Speaker Identification Using Gaussian Mixture Speaker Models,” Proc. IEEE Tran. Speech and Audio Processing, vol. 3, Jan. 1995, pp. 72-83.
[2]Lukáˇs Burget, Pavel Matˇejka, Petr Schwarz, Member, Ondrˇej Glembek, Student, and Jan Honza Cˇ ernocký, “Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System,” IEEE transactions on speech, audio and language processing, vol. 15, no. 7, pp. 1979-1985, september 2007.
[3]Mikyong Ji, Sungtak Kim, Hoirin Kim, Member, IEEE, Keun-Chang Kwak, and Young-Jo Cho, “Reliable Speaker Identification Using Multiple Microphones in Ubiquitous Robot Companion Environment,” 16th IEEE International Conference on Robot & Human Interactive Communication, 2007.
[4]Qin Jin, Tanja Schultz, and Alex Waibel, “Far-Field Speaker Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, Sep. 2007.
[5]Wan Vincent and Renals Steve, “Speaker verification using sequence discriminant support vector machines,” IEEE transactions on speech and audio processing, vol. 13, No. 2, march 2005.
[6]William M. Campbell, Joseph P. Campbell, Terry P. Gleason, Douglas A. Reynolds, and Wade Shen, “Speaker Verification Using Support Vector Machines and High-Level Features,” IEEE transactions on speech , audio and language processing, vol. 15, no. 7, september 2007.
[7]J.C. Wang, C.H.Yang, J.F. Wang, and H.P. Lee, “Robust speaker identification and verification,” IEEE Compu. Intell. Mag., pp.52-59, May 2007.
[8]C. M. Bishop, Pattern Recognition and Machine Learning, New York, NY :Springer Science+Business Media, pp. 325-358, 2006.
[9]Michael Feld, “Embedded Modules for Speaker Classification,” IEEE Conference on Semantic Computing, ICSC, pp.370-377, Aug. 2008.
[10]Dong Wang, Liang Zhang, Jia Liu, and Runsheng Liu, “Embedded Speech Recognition System on 8-Bit MCU Core,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04), vol. 5, V- 301-4 vol.5 , May. 2004.
[11]Yan Chen, Qingyang Hong, XiaoYang Chen, Caihong Zhang, “Real-Time Speaker Verification Based on GMM-UBM for PDA,” Fifth IEEE International Symposium on Embedded Computing, Publication Date: 6-8, pp.243-246, Oct. 2008.
[12]B. Tydlitat, J.Navratil, J.W. Pelecanos, G.N. Ramaswamy, ”Text-Independent Speaker Verification in Embedded Environments,” IEEE International Conference on Acoustics, Speech amd Signal Processing, vol. 4, pp. IV-293-IV-296, April 2007.
[13]A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, “The ICSI meeting corpus,” in Proc. ICASSP, 2003, pp. I-364–I-367.
[14]S. Y. Lung, “Wavelet feature selection based neural networks with application to the text independent speaker identification,” Pattern Recognition. vol. 39, pp. 1518–1521, Feb. 2006.
[15]Rabiner, L. R. and Schafer, R. W.: Digital Processing of Speech Recognition Signals. Prentice-Hall Co. Ltd, 1978.
[16]Huang, X., Acero, A. and Hon, H.: Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall Co. Ltd, 2001.
[17]C. Cortes and V. Vapnik, “Support vector networks,” Machine Learning, vol. 20, pp. 273-297, 1995.
[18]V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.
[19]V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[20]B. Schölkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Müller, G. Rätsch, and A. Smola, ”Input space vs. feature space in kernel-based methods,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1000-1017, 1999.
[21].J.C.Platt, “Fast Training of Support Vector Machines using Sequential Minimal Optimization,” Advances in Kernel Methods – Support Vector Learning, pp. 185-208, 1999.
[22].J.C.Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines.” Technical Report MSR-TR-98-14, Microsoft Research, 1998.
[23]LIBSVM: Software tool for support vector classification, develop by CSIE, NTU July, 2007, Website: http://www.csie.ntu.edu.tw/ ~cjlin.
[24]Ta-Wen Kuan, Jhing-Fa Wang, Jia-Ching Wang, and Gaung-Hui Gu, “VLSI Design of Sequential Minimal Optimization Algorithm for SVM Learning,” IEEE International Symposium on Circuits and Systems, ISCAS, pp2509-2512, May 2009.
[25]Andrew N. Sloss, Dominic Symes, and Chris Wright: ARM System Developer’s Guide: Designing and Optimizing System Software.
[26]Chin-Lung Hart SU, Jyh-Shing Roger Jang, “Speech Recognition on 32-bit Fixed-point Processors: Implementation & Discussions,” Master’s Thesis, Tsing Hua University, Hsinchu City, Taiwan. 2005.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊