(3.236.118.225) 您好!臺灣時間:2021/05/17 09:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:歐大誠
研究生(外文):Da-Cheng Ou
論文名稱:應用於語者確認之支撐向量機參數最佳化研究
論文名稱(外文):A Study on SVM Parameter Optimization for Speaker Verification
指導教授:丁英智
學位類別:碩士
校院名稱:國立虎尾科技大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:63
中文關鍵詞:語者確認參數最佳化Fuzzy GMM-regulated γFuzzy DTW-regulated γFuzzy DTW-regulated C
外文關鍵詞:Speaker VerificationParameter OptimizationFuzzy GMM-regulated γFuzzy DTW-regulated γFuzzy DTW-regulated C
相關次數:
  • 被引用被引用:1
  • 點閱點閱:172
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出了應用於支撐向量機(Support Vector Machine, SVM)參數最佳化之多模型混合語者確認技術,本論文主要的研究方向是結合高斯混合模型(Gaussian Mixture Model, GMM)、動態時間校正技術(Dynamics Time Warping, DTW)、支撐向量機模型與模糊模型(Fuzzy Model)等以對傳統的單一語音模型方式之語者確認系統做進一步的辨識性能改良。
本論文在進行運用多模方式以對SVM參數最佳化研究之前先針對多模型混合辨識系統進行研究。在此部份的研究中,本論文設計了一種兼具語音辨識與語者辨識之多模型混合辨識系統,此系統之前端為語者確認部分,此部分採用了平行式架構而同時融合高斯混合模型及支撐向量機模型,並以投票式SVMGMM演算法進行語者確認之決策判斷,此系統之後端則為語音辨識部分,此部份採用動態時間校正辨識技術。所發展之多模型混合辨識方法經由三類語音資料庫實驗測試後證實其效能確實有效,前端語者確認之性能優於傳統式之單一高斯混合模型或單一支撐向量機模型,其具備73.37%的識別率,在後端語音辨識部份,由於前端語者確認已經剔除不合適資料,因而後端DTW辨識亦能達73.70%的高度識別率。
在多模混合應用於SVM參數最佳化研究方面,本論文提出三種SVM參數最佳化調整方式,此三種參數調整方式分別為運用GMM語者辨識調整SVM參數γ、運用DTW語音辨識調整SVM參數γ與運用DTW語音辨識調整SVM參數C等。此三種所發展之強化SVM模型之參數調整方式皆運用模糊模型技術以進行參數調校,在此部份研究中,本論文提出Fuzzy GMM-regulated γ、Fuzzy DTW-regulated γ及Fuzzy DTW-regulated C等三種方法。就Fuzzy GMM-regulated γ方法而言,該方法藉由模糊控制機制之依據合法語者與非法語者之兩類高斯混合模型之模型平均向量差異來調整參數γ,並進而控制SVM hyperplane的邊界大小而提昇SVM分類器的辨識準確度。實驗結果可知在經由Fuzzy GMM-regulated γ調校過後的支撐向量機分類器有著89.20%的優異辨識性能;在Fuzzy DTW-regulated γ的研究中,Fuzzy DTW-regulated γ藉由模糊控制機制依據合法語者與非法語者之兩類動態時間校正的距離值差異來調整參數γ並進而糾正SVM hyperplane的邊界大小而能提高SVM分類器的辨識準確度。經由Fuzzy DTW-regulated γ調校過後的支撐向量機分類器有著88.89%的辨識率;Fuzzy DTW-regulated C方法則是藉由模糊控制機制之依據合法語者與非法語者之兩類動態時間校正的距離值差異來調整SVM的參數C量值而能估算出SVM hyperplane之合適邊界大小,此方式將提昇SVM分類器的辨識準確度,經Fuzzy DTW-regulated C調校過後的支撐向量機分類器有著84.27%的辨識率。此部份之SVM參數最佳化研究中所提出之應用於語者確認的Fuzzy GMM-regulated γ、Fuzzy DTW-regulated γ及Fuzzy DTW-regulated C等三種方法確實可較傳統之任意給定參數γ或參數C的SVM語者確認方法具備更優異之辨識準確度。


In this paper, we present a new technology framework of speaker verification, which is support vector machine (SVM) parameter optimization for speaker verification. The main purpose of this framework is to combine the GMM model, DTW technique, SVM model and the fuzzy model to enhance the conventional single SVM model speaker verification.
We first precede the research of multi-model combination for speaker verification systems. As the multi-model combination system, we proposed a framework that combines speech recognition and speaker verification. As this framework, we present a parallel mode which combines the GMM model and the SVM model for speaker verification in the front side of multi-model combination system. Furthermore, we use the algorithm of voting SVMGMM when making a decision of speaker verification result. And the back side of the framework of multi-model combination system is the speech recognition system which takes use of the DTW technology. Experiments confirmed that the multi-model combination framework has the effective performance. The performance of front side of speaker verification is better than that of the traditional single Gaussian mixture model and that of the single support vector machine model. It has the accuracy of 73.37% recognition rate. As the performance of the back side of DTW speech recognition, since the forward speaker verification has removed the inappropriate testing data, the DTW speech recognition has a nice accuracy performance, which achieves 73.70%.
In the study of multi-mode combination for SVM parameter optimization, we proposed three kinds of methods to optimize the parameters of SVM, which are GMM speaker verification to optimize SVM parameter γ, DTW speech recognition to optimize SVM parameter γ and DTW speech recognition to optimize SVM parameter C. Fuzzy modeling techniques are employed to these three SVM parameter optimization methods. First, the Fuzzy GMM-regulated γ is proposed. Fuzzy GMM-regulated γ inputs the difference of mean vectors into fuzzy controller to output the SVM parameterγ. The difference of mean vectors was calculated from the GMM of valid speakers and the GMM of invalid speakers. Furthermore, the Fuzzy GMM-regulated γ also controls the size of boundary of SVM hyperplane to enhance the verification accuracy of SVM classifier. The experiment shows that the proposed Fuzzy GMM-regulated γ has 84.26% accuracy. Second, the Fuzzy DTW-regulated γ method is proposed. Fuzzy DTW-regulated γ inputs the DTW distance into fuzzy controller and outputs the SVM parameterγ. The DTW distances are calculated from the valid speakers and invalid speakers by the DTW algorithm. Fuzzy DTW-regulated γ can also control the size of boundary of SVM hyperplane to raise the verification accuracy of SVM classifier. The experiment shows that the proposed Fuzzy DTW-regulated γ has 82.56% accuracy. Third, the Fuzzy DTW-regulated C method is proposed. Fuzzy DTW-regulated C inputs the DTW distance into fuzzy controller and outputs the SVM parameter C. The DTW distances are calculated from the valid speakers and invalid speakers by DTW algorithm. The proposed Fuzzy DTW-regulated C can also find a proper size of boundary of SVM hyperplane to raise the verification accuracy of SVM classifier. The experiment shows that Fuzzy DTW-regulated C has the accuracy rate of 79.93%. Experimental results on speaker verification confirmed that the verification accuracy of all three developed SVM parameter optimization methods is better than that of the traditional single SVM classifier.


摘要...i
Abstract...iii
誌謝...v
目錄...vi
圖目錄...viii
表目錄...ix
第一章 緒論...1
1.1 研究動機...1
1.2 語者辨識概述...1
1.3 研究方向...2
1.4 章節概要...2
第二章 語者辨識基礎技術...3
2.1 語音特徵參數擷取(Feature Extraction)...3
2.2 高斯混合模型(Gaussian Mixture Model, GMM)...4
2.2.1 向量量化(Vector Quantization, VQ)...5
2.2.2 期望值最大化演算法(Expectation Maximization Algorithm, EM)...6
2.2.3 最大事後機率法(Maximum a Posteriori Criterion)...7
2.2.4 最大相似法則(Maximum Likelihood Criterion)...7
2.3 動態時間校正(Dynamics Time Warping, DTW)...8
2.3.1 路徑限制(Path Constraint)...9
2.4 支撐向量機(Support Vector Machine, SVM)...12
2.4.1 線性SVM分類器(Linear SVM classifier)-線性可分...12
2.4.2 線性SVM分類器-線性不可分...15
2.4.3 核函數(Kernel Function)...17
2.5 投票式GMM/SVM演算法(Algorithm for Voting-GMMSVM)...18
第三章 語者確認之SVM參數最佳化...20
3.1 SVM中之參數C與γ...21
3.2 多模型混和...23
3.3 運用GMM語者確認調整SVM參數γ...25
3.3.1 GMM-regulated γ...25
3.3.2 Fuzzy GMM-regulated γ...28
3.4 運用DTW語者確認調整SVM參數γ...29
3.4.1 DTW-regulated γ...29
3.4.2 Fuzzy DTW-regulated γ...32
3.5 運用DTW語者確認調整SVM參數C...33
3.5.1 DTW-regulated C...33
3.5.2 Fuzzy DTW-regulated C...35
3.6 SVM參數設定關係...36
第四章 實驗結果...37
4.1 語音資料庫...37
4.2 多模型混和之辨識性能實驗...39
4.3 SVM參數最佳化之語者確認實驗...42
4.3.1 Fuzzy GMM-regulated γ之實驗結果...43
4.3.2 Fuzzy DTW-regulated γ之實驗結果...46
4.3.3 Fuzzy DTW-regulated C之實驗結果...49
4.4 支撐向量機參數設定選取範圍...53
第五章 結論...54
參考文獻...55
Extended Abstract...58
簡歷(CV)...63

[1]G. R. Doddington, “Speaker recognition – identifying people by their voices,” in Proc. IEEE, Vol. 73,No. 11, Nov.1985, pp. 1651-1664.
[2]P. Day and A. K. Nandi, “Robust Text-Independent Speaker Verification Using Genetic Programming,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 15, No. 1, pp. 285-295, Jan. 2007.
[3]L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, New Jersey: Prentice Hall, 1993.
[4]X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, New Jersey: Prentice Hall, 2001.
[5]J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, New York: Addison Wesley, 1974.
[6]N. Wang, P. C. Ching, N. Zheng and T. Lee, “Robust speaker recognition using denoised vocal source and vocal tract features,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 19, No. 1, pp.196–205, Jan. 2011.
[7]C. -S. Jung, M. -Y. Kim and H.-G. Kang, “Selecting feature frames for automatic speaker recognition using mutual information,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 18, No. 6, pp. 1332–1340, Aug. 2010.
[8]B. K. Sy, “Secure computation for biometric data security—application to speaker verification,” IEEE Systems Journal, Vol. 3, No. 4, pp. 451–460, Dec. 2009.
[9]M. Joseph, Language and Speech Processing, New York: John Wiley & Sons Inc, 2009.
[10]B. H. Juang, (1998, May.) The past, present, and future of speech processing, IEEE Signal Processing Magazine, pp. 24-48.
[11]R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition,” IEEE Trans. Speech and Audio Processing, Vol. 7, No. 5, pp. 525–532, Sep. 1999.
[12]J. Makhoul, “Linear Prediction: a tutorial review,” Proceedings of the IEEE, Vol. 63, No. 4, pp. 561-580, April 1975.
[13]A. M. Kondoz, Digital Speech Coding for Low Bit Rate Communications Systems, New York: Wiley, 1994.
[14]D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models,” IEEE Trans. Speech and Audio Processing, Vol. 3, No. 1, pp. 72-83, Jan. 1995.
[15]P. Kenny, G. Boulianne, P. Ouellet, and P. Dumouchel, “Speaker and session variability in GMM-based speaker verification,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 15, No. 4, pp. 1448–1460, May. 2007.
[16]L. Burget, P. Matejka, P. Schwarz, O. Glembek, and J. Cernocky, “Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 15, No. 7, pp. 1979-1986, Sept. 2007.
[17]Y. Linde, A. Buzo and R. M. Gray, “An Algorithm for the Vector Quantizer Design,” IEEE Trans. Communication, Vol. 28, No. 4, pp. 84-95, Jan. 1980.
[18]X. D. Huang, Y. Ariki and M. A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh: Edinburgh University Press, 1990.
[19]T. K. Moon, (1996,Nov.) The expectation-maximization algorithm, Signal Processing Magazine, Vol. 13, No. 6, pp. 47–60.
[20]J. L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Processing, Vol. 2, pp. 291-298, Apr. 1994.
[21]W. Li, A. Billard, and H. Bourlard, “Keyword Detection for Spontaneous Speech,” in Proc. Congress on Image and Signal Processing. CISP’09, pp. 1-5, Oct. 2009.
[22]D. J. Burr, B. Ackland, and N. Weste, “Array configurations for dynamic time warping,” IEEE Trans. Acoust, Speech, Signal Processing, Vol. 32, No. 1, pp. 43-49, Feb. 1984.
[23]王小川,語音訊號處理,修訂二版。新北市:全華圖書,2008。
[24]M. Ferras, C. -C. Leung, C. Barras and J. -L Gauvain, “Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 18, No. 6, pp.1366–1378, Aug. 2010.
[25]M. McLaren, R. Vogt, B. Baker, and S. Sridharan, “Data-driven background dataset selection for SVM-based speaker verification,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 18, No. 6, pp. 1496–1506, Aug. 2010.
[26]S. Raghavan, G. Lazarou and J. Picone, “Speaker verification using support vector machines,” in Proc. IEEE SoutheastCon Conf, Memphis, TN, USA, 31 Mar.~2 Apr. 2006. pp. 188–191.
[27]B. Scholkopf, C. Burges and A. J. Smola, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA: MIT Press, 1999, pp.255-268.
[28]H. Zhu, X. Yang, and Y. Luo, “Classification of Urban Remote Sensing Image Based on Support Vector Machines,” in Proc. International Geoinformatics Conf, Fairfax, VA, USA, Aug. 2009, pp. 1-6.
[29]J. Bai, X. -Y. Zhang, J. –K. Duan, 2008, “Application of Support Vector Machine with Modified Gaussian Kernel in A Noise-Robust Speech Recognition System” IEEE International Knowledge Acquisition and Modeling Workshop. KAM Workshop 2008, Wuhan, CHINA, Dec. 2008, pp. 502-505.
[30]M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, “Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning,” Autom. Remote Control, Vol. 25, Jun. 1964, pp. 821-837.
[31]B. Narayanaswamy, R. Gangadharaiah, “Extracting Additional Information from Gaussian Mixture Model Probabilities for Improved Text-Independent Speaker Identification,” in Proc. IEEE International Conf. Acoustics, Speech, and Signal Processing, ICASSP’05, Vol. 1, Mar. 2005, pp. 621-624.
[32]林子正,2012,基於多模型架構之語者辨認系統,國立虎尾科技大學電機工程系碩士班碩士論文。
[33]V. Vapnik and O. Chapelle, “Bounds on error expectation for support vector machines,” Neural Computation, Vol. 12, pp.2013–2036, 2000.
[34]M. Wu and J. Ye, “A small sphere and large margin approach for novelty detection using training data with outliers,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 31, No. 11, pp. 2088–2092, Nov. 2009.
[35]C. -W. Hsu, C. -C. Chang, C. -J.Lin, A Practical Guideto Support Vector Classification Available: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
[36]M. -Q. Xu, B. -Q.Dai, D. –X. Xu and S. –Q. Yang, “SVM-based Text-independent Speaker Verification using Derivative Kernel in the Reference GMM Space,” in Proc. International Symposiums on Information Processing, ISIP’08, May 2008, pp. 422-425.
[37]Y. Song and L. –R. Dai, “A Sample and Feature Selection Scheme for GMM-SVM Based Language Recognition,” in Proc. International Symposium on Chinese Spoken Language Processing, ISCSLP’08, Dec. 2008, pp. 1-4.
[38]J. Zhao, Y. Dong, X. –Y. Zhao, H. Yang, L. Lu and H. –L. Wang, (2008,Aug.). Advances in SVM-based system using GMM super vectors for text-independent speaker verification. Tsinghua Science and Technology. Vol. 13, No.4. pp. 522-527.
[39]X. Wang, J. –P. Zhang and Y. –H. Yan, “Automatic Detection of Pathological Voices Using GMM-SVM Method,” in Proc. Biomedical Engineering and Informatics, BMEI’09, Oct. 2009, pp. 1-4.
[40]C. -C. Hsu, M. -F. Han, S. -H. Chang and H. -Y. Chung, “Fuzzy support vector machines with the uncertainty of parameter C,” Expert Systems with Applications, Vol. 36, pp. 6654–6658, Apr. 2009.
[41]J. -X. He and Z. –X. Liu, (2002,Nov.). Combined SVM/DTW for Speech Recognition. Journal of Guizhou University(Natural Science) Vol. 19, No. 4. pp. 320-324.
[42]J. Villalba and E. Lleida, “Preventing replay attacks on speaker verification systems,” in Proc. IEEE International Carnahan Conference on Security Technology, ICCST, Oct. 2011, pp. 1-8.
[43]C. Bahlmann, B. Haasdonk and H. Burkhardt, “Online Handwriting Recognition with Support Vector Machines - A Kernel Approach,” in Proc. International Workshop on Frontiers in Handwriting Recognition, IWFHR’02, Aug. 2002, pp. 49-54.
[44]S. Gudmundsson, T. P. Runarsson and S. Sigurdsson, “Support vector machines and dynamic time warping for time series,” in Proc. IEEE International Joint Conference Neural Networks, IJCNN’08, June 2008, pp. 1-8.
[45]H. Shimodairay, K. Nomay, M. Nakaiy and S. Sagayamayz, “Support Vector Machine with Dynamic Time-Alignment Kernel for Speech Recognition,” in Proc. EUROSPEECH 2001 Scandinavia 7th European Conference on Speech Communication and Technology, Sept. 2001, pp. 1841-1844.
[46]J. Vavrek, M. Pleva, and J. Juhar, “TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM,” in Proc. MediaEval Benchmarking Initiative for Multimedia Evaluation, Oct. 2012.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top