跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.106) 您好!臺灣時間:2026/04/02 08:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:周文德
研究生(外文):Wun-De Jhou
論文名稱:以支援向量機為基礎之語者識別研究
論文名稱(外文):Research of SVM-Based Speaker Identification
指導教授:吳俊德吳俊德引用關係
指導教授(外文):Gin-Der Wu
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:43
中文關鍵詞:支援向量機語者識別梅爾倒頻譜參數主成分分析線性鑑別分析高斯混合模型
外文關鍵詞:support vector machine(SVM)speaker identificationMel-frequency cepstral coefficients(MFCC)principal component analysis(PCA)linear discriminant analysis(LDA)Gaussian mixture model(GMM)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:347
  • 評分評分:
  • 下載下載:74
  • 收藏至我的研究室書目清單書目收藏:0
此篇論文主要探討以支援向量機(support vector machine, SVM)訓練模型與其他訓練模型的方式比較,並以不同之強健方法提升語者識別系統中之辨識率。在此研究中,我們對於語料所採用的方式是直接對語料求取梅爾倒頻譜參數(Mel-frequency cepstral coefficients, MFCC ),作為語者分析的特徵參數。
然而,現實生活中的背景噪音卻可能大大的影響語者的辨識,例如:街道嘈雜聲、工廠作業聲…等。本篇論文將引用主成分分析(principal component analysis, PCA)、線性鑑別分析(linear discriminate analysis, LDA)強健語料之特徵參數,再利用SVM、高斯混合模型(Gaussian mixture model, GMM)等不同方法建立語者模型。接著我們利用此系統辨識語者,分別由20人(10男、10女)提供共4000個語音檔,每位語者唸中文數字(0-9)20次,每人選用160個音檔資訊作為參考音檔,其餘則作為測試音檔。在快速變動之背景噪音情況下測試,於不同強健、建模型之模式中可得其辨識率,最後再加以比較、討論。
The thesis is investigated into training models of support vector machine (SVM) to compare with another ways, and used different methods of enhancement to improve the performance in the speaker identification system. In the study, we used Mel-frequency cepstral coefficients (MFCC) to convert the speaker data as the features of speaker identification.
However, the noisy background in our life may interfere with the performance, such as noise on the streets, factories, and so on. The thesis will employ principal component analysis (PCA) and linear discriminant analysis (LDA) to enhance speaker features, then using SVM and Gaussian mixture model (GMM) to set up speaker models. Next, we used the system to identify the speakers. We adopted numbers in Chinese (0-9) from 20 speakers (10 males and 10 females), then everyone chanted 20 times for each number (total files: 4000). We selected 160 files of each one as the training file, the remainder as the testing files.
Finally, we compared and discussed the results which are tested in several variable background noises form different conditions.
Acknowledgments.............................................i
Abstract in Chinese........................................ii
Abstract in English.......................................iii
Contentsi...................................................v
List of figures............................................vi
List of tables............................................vii
Chapter 1 Introduction......................................1
1.1 Motivation..............................................1
1.2 Overview of Speaker Recognition.........................1
1.3 Thesis Organization.....................................3
Chapter 2 The Basic Technologies of Speaker Identification..4
2.1 Introduction............................................4
2.2 Feature Extraction......................................5
2.2.1 Pre-emphasis..........................................6
2.2.2 Frame Blocking........................................6
2.2.3 Windowing.............................................7
2.2.4 Fast Fourier Transform................................8
2.2.5 Triangular Bandpass Filter............................9
2.2.6 Logarithm Transform and Discrete Cosine Transform....10
2.2.7 Energy...............................................10
2.3 Speaker Model..........................................11
2.3.1 K-means Clustering Algorithm.........................11
2.3.2 Model Describe.......................................12
2.3.3 Model Parameter Estimation...........................14
2.3.4 Speaker Recognition..................................15
Chapter 3 Support Vector Machines..........................17
3.1 Introduction...........................................17
3.2 Linear Classifier......................................17
3.3 Non-separable Case.....................................20
3.4 Non-linear Classifier..................................21
3.5 Multi-class Classification.............................23
Chapter 4 Robustness Technologies..........................26
4.1 Temporal Filter........................................26
4.2 Principal Component Analysis Temporal Filter...........27
4.3 Linear Discriminant Analysis Temporal Filter...........28
Chapter 5 Experiments and Results..........................31
5.1 System Specification...................................31
5.2 Basic Experiments and Results..........................32
5.2.1 The Experiments in the GMM...........................32
5.2.2 The Experiments in the SVM...........................34
5.3 Robust Techniques Effect for Speaker Identification....35
5.4 Noisy Experiments and Results..........................37
5.4.1 Noisy Experiments in the GMM.........................37
5.4.2 Noisy Experiments in the SVM.........................39
Chapter 6 Conclusions and Future Work......................41
Bibliography...............................................42
[1]D. A. Reynolds, “An overview of automatic speaker recognition technology” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on, Vol. 4, 2002.
[2]R. J. Mammone, Z. Xiaoyu, and R. P. Ramachandran, “Robust speaker recognition: a feature-based approach” Signal Processing Magazine, IEEE, Vol. 13, 1996.
[3]D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models” Speech and Audio Processing, IEEE Transactions on, Vol. 3, Issue 1, pp72-83, Jan. 1995.
[4]J. J. Webb and E. L. Rissanen, “Speaker identification experiments using HMMs,” Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, Vol. 2 , 27-30 Apr 1993.
[5]H. Seddik, A. Rahmouni, and M. Sayadi, “Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier” Control, Communications and Signal Processing, 2004. First International Symposium on, 2004.
[6]G. Singh, A. Panda, S. Bhattacharyya, and T. Srikanthan, “Vector quantization techniques for GMM based speaker verification” Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on, Vol. 2, 6-10 April 2003
[7]D. Tran and M. Wagner, “A robust clustering approach to fuzzy Gaussian mixture models for speaker identification” Knowledge-Based Intelligent Information Engineering Systems, 1999. Third International Conference, 31 Aug.-1 Sept. 1999
[8]T. K. Moon, “The expectation-maximization algorithm” Signal Processing Magazine, IEEE, Vol. 13, Nov. 1996
[9]J. K. Sing, D. K. Basu, M. Nasipuri, and M. Kundu; “Improved k-means algorithm in the design of RBF neural networks” TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, Vol. 2, 15-17 Oct. 2003
[10]F. Moreno-Seco, L. Mico, and J. Oncina; “A new classification rule based on nearest neighbour search” Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 4, 23-26 Aug. 2004
[11]F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines” IEEE Trans. Geosci. and Remote Sens., Vol. 42, No. 8, pp. 1778-1790, Aug. 2004.
[12]C. H. Wu, G.. H. Tzeng, Y. J. Goo, and W. C. Fang, “A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy” ESWA 1584, 10 January 2006.
[13]G. Guo and S. Z. Li, “Content-based audio classification and retrieval by support vector machines” Transactions on neural networks, IEEE, Vol. 14, No. 1, January 2003.
[14]F. Schwenker, “Hierarchical support vector machines for multi-class pattern recognition” Knowledge-Based Intelligent Engineering Systems and Allied Technologies, 2000. Proceedings. Fourth International Conference on, Vol. 2, 30 Aug.-1 Sept. 2000.
[15]J. C. Wang, C. H. Yang, J. F. Wang, and H. P. Lee, “Robust speaker identification and verification” Computational Intelligence Magazine, IEEE Vol. 2, Issue 2, May 2007.
[16]C. C. Lin, S. H. Chen, T. K. Truong, and Y. Chang, “Audio classification and categorization based on wavelets and support vector machine” IEEE Transactions on speech and audio processing, Vol. 13, No. 5, September 2005.
[17]J. W. Hung and L. S. Lee, “Optimization of temporal filters for constructing robust features in speech recognition” IEEE Transaction on audio, speech, and language processing, Vol.14, No. 3, May 2006.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top