跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2024/12/05 08:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:羅育仁
研究生(外文):Luo,Yu Ren
論文名稱:運用支持向量機在特定語句語者驗證之研究
論文名稱(外文):The Study on Text-dependent Speaker Verification Using Support Vector Machine
指導教授:陳璽煌陳璽煌引用關係
指導教授(外文):Chen, Shi Huang
學位類別:碩士
校院名稱:樹德科技大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:63
中文關鍵詞:線性預測係數線頻譜頻對梅爾倒頻譜支持向量機器語者確認語者驗證
外文關鍵詞:linear prediction coefficient (LPC)line spectrum frequency (LSF)mel-frequency cepstral coefficient (MFCC)support vector machine (SVM)speaker identificationspeaker verification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:610
  • 評分評分:
  • 下載下載:78
  • 收藏至我的研究室書目清單書目收藏:1
隨著語者辨識技術的進展,這項技術目前已廣泛地應用在許多語音系統上,例如門禁系統、資料庫的存取、信用卡確認、銀行業務的電話語音系統及安全認證系統等,不過在安全性上仍有很大的改良空間,為了提升語者識別系统的辨識性能,本論文提出一種利用說話人的特徵參數和支持向量機(SVM)來開發一種可辨識語者特定字句的新型演算法。

首先,本演算法採用語者選擇的語音字句為密碼,然後利用SVM來針對語者語音字句的特徵參數進行訓練,進而產生了語者的特徵模型,之後SVM就可區別語者本人或其他冒充者。在語者辨識系統的特徵參數方面,本論文主要運用線性預測係數(LPC)和線頻譜頻對(LSF)參數和梅爾倒頻譜(MFCC)參數、線性預測係數結合梅爾倒頻譜參數(LPC+MFCC)、差量倒頻譜參數(ΔMFCC)、2次差量倒頻譜參數(ΔΔMFCC)六組參數做效能比較,接著進行語者識別。實驗採用Aurora- 2.0數位語音資料庫進行測試。


從實驗中可以發現,採用2次差異量梅爾倒頻譜參數(ΔΔ MFCC) 做為語音特徵參數的語者識別系統上,會比採用LPC、LSF、MFCC、LPC-MFCC與ΔMFCC識別率來的好,在僅使用22階2次差異量MFCC的情況下。本論文提出之語者驗證演算法得到平均準確率在95.37%、而EER則是0.0%,明顯發現2次差異量梅爾倒頻譜參數在語者識別上的總辨識率比其它參數較佳。
With the progress of speaker recognition, such technologies have been widely used in speech-related applications, e.g., entrance guard, database access, credit card, telephone banking, and security system. However, there is still a need to improve its reliability. In order to increase the performance of speech recognition system, this thesis proposed a new text-dependent speaker verification system using various speaker features and support vector machine (SVM).

First the proposed algorithm makes use of speech signal spoken by user as password. The SVM then is applied to train a speaker model using the speaker features extracted from the speech signal. Finally the proposed algorithm can perform text-dependent speaker verification by the trained speaker model. This thesis studied six sets of speaker features on the proposed SVM-based speaker verification system. They are linear prediction coefficient (LPC), line spectrum frequency (LSF), mel-frequency cepstral coefficient (MFCC), LSF-MFCC, delta-MFCC, and delta-delta-MFCC. Experiments are conducted on Aurora-2 speech database with different order of the speaker features.
It follows from experimental results that the delta-delta-MFCC has the better speaker verification performance than the other five features, i.e., LPC, LSF, MFCC, and delta-MFCC. The proposed text-dependent speaker verification system based on the 26th-order delta-delta-MFCC and SVM gives an equal error rate (EER) of 0.0% and average accuracy rate of 95.37%.
摘要 I
ABSTRACT III
誌謝 V
目錄 VI
圖目錄 VIII
表目錄 IX
第一章 緒論 1
1.1 研究動機 1
1.2 語者辨識系統概述 4
1.3 相關研究 7
1.4 研究方法 9
1.5 論文架構 10
第二章 語音訊號處理與特徵參數擷取 11
2.1語音訊號前處理 12
2.2梅爾倒頻譜 13
2.2.1取框 14
2.2.2預強調 15
2.2.3漢明窗 15
2.2.4離散傅立葉轉換 16
2.2.5梅爾濾波器組 17
2.3差量倒頻譜參數 20
2.4線頻譜對 22
2.4.1線性預測介紹 23
2.4.2線頻譜頻率介紹 26
2.4.3 LPC至LSF轉換 27
2.4.4 LSF至LPC換轉 28
第三章支持向量機法 32
3.1 線性支持向量機 33
3.2 非線性支持向量機 37
3-3 核心函數 38
第四章 語者辨識方法與結果 41
4.1 語者識別實驗 42
4.1.1 語者驗證系統 - 實驗一 45
4.1.2 語者驗證系統 - 實驗二 46
4.1.2 語者驗證系統 - 實驗三 47
4.1.3 語者驗證系統 - 實驗四 49
4.1.4 語者驗證系統 - 實驗五 51
4.2 不同特徵參數的結果 54
第五章 結論與未來展望 57
參考文獻 59
[1]王小川,語音訊號處理,全華科技圖書,2004
[2]王麒瑋, ”支向機核心函數適用指標之建立”, 國立成功大學工業管理科學研究所, 2004年7月
[3]阮俊清,“LSP快速演算法研究”,樹德科技大學資訊工程研究所碩士論文 , 2007年7月
[4]張志豪,”強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究”,國立臺灣師範大學資訊工程研究所碩士論文,2007年7月
[5]陳俊傑, ”結構化語者模型之研究”,國立中央大學資訊工程研究所碩士論文,2006年7月
[6]黃承龍,陳穆臻,王界人,2004,“支援向量機於信用評等之應用”,計量管理期刊,Vol.1, No. 2, pp.155-172, Dec. 2004
[7]劉家村, ”生物特徵辨識-手指紋路辨識”, 國立中央大學資訊工程研究所碩士論文, 2005 年7 月
[8]鍾偉仁,“語者辨認及驗證之初步研究”, 台灣大學碩士論文, 民國90年
[9]“Linear Prediction: A tutorial review,” Proc. IEEE, pp. 561-580, April 1995
[10]A. Bendiksen and K. Steiglitz, “Neural Networks for Voiced/Unvoiced Speech Classification,” IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, No. 90, pp. 521-524, 1990
[11]A. E. Rosenberg and M. R. Sambur, “New techniques for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-23, Apr. 1975, pp. 169-176.
[12]A. Lodi, M. Toma, R. Guerrieri, “Very low complexity prompted speaker verification system based on HMM-modeling,” in IEEE Int. Conference, Acoustics, Speech, and Signal Processing, Vol. 4, pp. 3912–3915, 2002
[13]A. Mezghani and D. O'Shaughnessy, “Speaker verification using a new representation based on a combination of MFCC and formants,” 2005 Canadian Conference on Electrical and Computer Engineering, pp. 1461-1464, May 2005
[14]Abe, S., “Analysis of Support Vector Machines”, Neural Networks for Signal Processing, Proceeding of the 12th IEEE Workshop on, Sept, 2002, pp89-98
[15]Altonji, J.G., and L.M. Segal, 1996, Small-Sample Bias in GMM Estimation of Covariance Structures," Journal of Business and Economic Statistics, 14, 353{366
[16]Anatolyev, S., 2005, GMM, GEL, Serial Correlation and Asymptotic Bias," Econometrica, 73, 983{1002
[17]B.A. Mellor and A.P. Varga, “Noise Masking in the MFCC Domain for the Recognition of Speech in Background Noise”, ICASSP 1992
[18]B.H. Juang, L.R. Rabiner, and J.G. Wilpon, "On the Use Bandpass Filtering in Speech Recognition," IEEE Trans. Acoustics, Speech, and Signal Processing, Vol. 35, No.7, pp. 947-954, July 1987
[19]C.C. Lin, S.H. Chen, T. K. Truong, and Yukon Chang, “Audio Classification and Categorization Based on Wavelets and Support Vector Machine,” IEEE Trans. on Speech and Audio Processing, Vol. 13, No. 5, pp. 644-651, Sept. 2005
[20]Chih-Wei Hsu, Chih-Jen Lin, “A comparison of methods for multiclass support vector machines, IEEE transactions on Neural Networks, Vol.13, March, 2002
[21]D.A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication 17. pp.91-108 , March 1995
[22]F. Itakura, ”Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Am., 57, 535(A), 1975
[23]F. K. Soong and B. H. Juang, “Line spectrum pair (LSP) and speech data compression,” in Proc. ICASSP-84, pp. 1.10.1–1.10.4, Mar. 1984
[24]F. Runstein and F. Violaro, “An Isolated-Word Speech Recognition System Using Neural Networks,” Proceeding of the 38th Midwest Symposium on Circuit and Systems, Vol. 1, 1995, pp. 550-553.
[25]Genoud D,Bimbot F,Gravier G,Chollet G,Combining methods to improve speaker verification decision. In: Proc of ICSLP'96,1996,Vol.3,1756~1759
[26]Glenn Fung, Olvi L. Mangasarian, “Proximal Support Vector Machine Classifiers”, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp77-86.
[27]Guiwen Ou and Dengfeng Ke, “Text-independent speaker verification based on relation of MFCC components,” 2004 International Symposium on Chinese Spoken Language Processing, pp. 57-60, Dec. 2004.
[28]H. Cordeiro, C.M. Ribeiro, “Speaker Characterization with MLSFs,” IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, pp. 1-4, June 2006
[29]H.T. Lin, C.J. Lin, 2003, A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods, Technical report, Department of Computer Science & Information Engineering, National Taiwan University
[30]He Jialong,Liu Li,Palm G. A new codebook training algorithm for VQ-based sperker recognition. ICASSP-97,1997,Vol,2,1091~1094
[31]I.M. Chagnolleau, G. Durou and F. Bimbot, “Application of time-frequency principal component analysis to text-independent speaker identification”, IEEE Transactions on Speech and Audio Processing, Vol. 10 No.6, pp. 371 –378, 2002
[32]J. D. Markel, A. H. Gray, Jr., “Linear Prediction of Speech,” Springer- Verlag, New York, 1976
[33]J. M. DeLeo and S. J. Rosenfeld, “Essential roles for receiver operating characteristic(ROC)methodology in classifier neural network applications,”in Proc. Int. Joint Conf. Neural Networks, vol.4, pp. 2730-2731, 2001.
[34]Jr. A. Gray and J. Markel, "A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 22, pp. 207 –217, Jun 1974
[35]K. K. Paliwal, “A study of line spectrum pair frequencies for speech recognition,” in Proc. ICASSP-88, pp. 485–488, Apr. 1988
[36]K. Woods and K. W. Bowyer, “Generating ROC curves for artificial neural networks,” IEEE Trans. Medical Imaging, vol. 16, no. 3, pp. 329-337, June 1997
[37]Lo T F,Mak M W. A new intra-frame and inter ~frame cepstral processing method for telephone-based speaker verification. In: Proc Int'l Workshop on Multimedia Data Storage,Retrieval,Integration and Applications,2000. 116~122
[38]M. R.Schroeder and B. S.Atal, “Code-excited linear prediction (CELP) : High-quality speech at very low bit rates,” Proc. ICASSP'85, pp. 937-940,Mar. 1985.
[39]M.J. Carey, E.S. Parris, S.J. Bennett and L.Thomas, “A comparison of model estimation techniques for speaker verification”, Proc. ICASSP 1997.Vol. 2, pp. 1083 –1086
[40]M.M Homayounpour and I. Rezaian, “Robust Speaker Verification Based on Multi Stage Vector Quantization of MFCC Parameters on Narrow Bandwidth Channels,” ICACT 2008, vol 1, pp.336-340, Feb. 2008
[41]N. Morgan, H. A. Bourlard, “Neural networks for statistical recognition of continuous speech,” Proceedings of the IEEE, Vol. 83, NO. 5, May 1995, pp. 742 772.
[42]P. Kabal and R. P. Ramachandran, “Computation of line spectral frequencies using chebyshev polynomials,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 1419–1426, Dec. 1986
[43]P. Stoica and A. Nehorai, “The poles of symmetric linear prediction models lie on the unit circle,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 1344-1346, Oct. 1986
[44]S. Dougherty, K. W. Bowyer and C. Kranenburg, “ROC curve evaluation of edge detector performance,” in Proc. Int. Conf. Image Processing, vol. 2, pp. 525 –529, 1998
[45]S. Furui, "An overview of speaker recognition techno Workshop on Automatic Speaker Recognition, Identification and Verification, page 1-9, 1994.
[46]S. Hiroya, M. Honda, “Determination of articulatory movements from speech acoustics using an HMM-based speech production model,” in IEEE Int. Conference, Acoustics, Speech, and Signal Processing, Vol. 1, pp. 437-440, 2002
[47]S. R. Gunn, 1998, “Support Vector machines for classification and regression,” Technical Repor,t University of Southampton.
[48]S. S. Yedlapalli, ”Transforming real linear prediction coefficients to line spectral representations with a real FFT,” IEEE Trans. on Speech and Audio Processing, vol.13, no. 5, pp.733–740, Sep. 2005
[49]S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans on ASSP, Vol.28, No.4, pp357-366, Aug. 1980.
[50]T. P. Barnwell, K. Nayebi, and Craig H. Richardson, Speech Coding: A Computer Laboratory Textbook, John Wiley & Sons Inc, New York. 1996
[51]W.M. Campbell and K.T Assaleh, “Polynomial classifier techniques for speaker verification”, Proc. ICASSP 1999, Vol. 1, pp. 321 -324
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top