(3.94.202.88) 您好!臺灣時間:2019/10/14 17:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
本論文永久網址: 
line
研究生:吳昱弘
研究生(外文):Yu-hung Wu
論文名稱:粒子群演算法應用於語者模型訓練與調適之研究
論文名稱(外文):PSO Algorithm for Speaker Model Training and Adaptation
指導教授:莊堯棠
指導教授(外文):Yau-tarng Juang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:58
中文關鍵詞:粒子群演算法語者模型
外文關鍵詞:PSO algorithmSpeaker model
相關次數:
  • 被引用被引用:0
  • 點閱點閱:99
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文將粒子群演算法應用於語者模型訓練與調適。由於簡單的概念、快速收斂與容易實現,粒子群演算法比基因演算法在處理各式各樣的工程問題上更有效。目前在本論文所使用的粒子群演算法,都是使用沒有改良過的粒子群演算法,原因在於我們的適應函數是用高斯混合的機率密度函數,此函數沒有過於複雜的數學式,所以我們僅使用最原始的粒子群演算法。在傳統的語者確認系統中,模型參數估計大多使用Expectation-maximization (EM) 演算法,在模型收斂過程中,EM演算法要花較多的時間去訓練模型,所以我們提出新的訓練方法,使用粒子群演算法來收斂模型。並從實驗的結果獲得比EM演算法更小的相等錯誤率與決錯成本函數,且其訓練模型的速度也優於EM演算法,確定所提方法的有效性。此外,在做語者模型調適時,平均向量是語者不特定模型最重要的參數,本論文結合粒子群演算法來獲得最佳的平均向量,實驗的結果顯示,本論文所提之方法,比起原本使用的Maximum a Posteriori (MAP) 調適法,可以使語者確認系統的效能提升。
This thesis introduces the application of Particle swarm optimization (PSO) techniques to speaker model training and adaptation problems. In convention, the Expectation-maximization (EM) algorithm is the dominant approach for model parameter estimation in speaker verification. The experimental results demonstrate that faster convergent rates for training and more accurate rates for speaker verification are obtained using the proposed PSO algorithm as compared to the EM algorithm. In addition, this thesis also utilized proposed the PSO algorithm to adjust the mean parameter in the speaker model adaptation. Experimental results again show that the proposed method outperforms the Maximum a Posteriori (MAP) adaptation in the speaker verification problem.
摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章 緒論 1
1.1研究動機 1
1.2語者辨識概述 2
1.3語者調適概述 4
1.4研究方向 4
1.5文獻探討 5
1.6章節架構 6
第二章 語者辨認的基本技術 8
2.1 語音特徵參數擷取 8
2.1.1音框化處理(Framing) 9
2.1.2預強調(Pre-Emphasis) 9
2.1.3加窗處理(Windowing) 10
2.1.4參數抽取 10
2.1.5差量倒頻譜參數 11
2.2 高斯混合模型 12
2.2.1模型描述 12
2.2.2語料多寡與差異對選擇高斯個數的影響 13
第三章 粒子群演算法應用於語者模型訓練 15
3.1 簡介 15
3.2 PSO演算法基本公式和模式 15
3.3 慣性權重 16
3.4 PSO演算法最佳化模型參數 21
第四章 Expectation-Maximization演算法 24
4.1 EM演算法 24
第五章 PSO演算法應用於語者模型調適 28
5.1 貝氏調適法 28
5.2 MAP調適法結合PSO演算法 32
第六章 實驗與討論 35
6.1 實驗語料 35
6.2效能評估方法 36
6.2.1相等錯誤率(Equal Error Rate, ERR) 36
6.2.2決策成本函數(Decision Cost Function, DCF) 37
6.3 EM與PSO比較 38
6.3.1實驗一 EM演算法最大化平均向量的影響 38
6.3.2實驗二 使用PSO演算法找出最佳的平均向量 41
6.4 MAP與MAP-PSO 比較 44
6.4.1實驗三 使用PSO演算法在語者模型調適中 44
6.5 EM-MAP與PSO-MAP-PSO 比較 46
6.5.1實驗四 使用PSO演算法在語者模型訓練與調適上 46
第七章 結論與未來展望 49
7.1結論 49
7.2 未來展望 51
參考文獻 52

[1] 吳金池, “語者辨識系統之研究,” 中央大學碩士論文, 民國90年. [2] L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol.77, pp. 257-286, 1989.
[3] Y. Tabet and M. Boughazi, “Speech synthesis techniques. A survey,” 7th International Workshop on Systems, Signal Processing and their Applications, pp. 67-70, 2011.
[4] 林品宏, “關鍵詞萃取系統及語音聲控車之應用,” 中央大學碩士論文, 民國101年.
[5] 呂易宸, “語音門禁系統,” 中央大學碩士論文, 民國100年.
[6] W. M. Campbell and D. E. Sturim, Member and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol.13, pp. 308-311, 2006.
[7] D. Burton, “Text-dependent speaker verification using vector quantization source coding,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol.35, pp. 133-143, 1987.
[8] A. Roland and C. Michael and L. T. Harvey, “Score Normalization for Text-Independent Speaker Verification Systems,” ScienceDirect Digital Signal Processing, vol.10, pp. 42-54, 2000.
[9] 丁英智, “語者調適演算法及其應用於線上之研究,” 中央大學碩士論文, 民國90年.
[10] B. Chen and J. W. Kuo and W. H. Tsai, “Lightly supervised and data-driven approaches to Mandarin broadcast news transcription ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. I - 777-80, 2004.
[11] M. Bacchiani and B. Roark, “Unsupervised language model adaptation,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. I-224 - I-227, 2003.
[12] A. R. Richard and F. W. Homer, “Mixture Densities, Maximum Likelihood and the Em Algorithm,” Society for Industrial and Applied Mathematics, vol.26, pp. 195-239, 1984.
[13] A. Christophe and D. F. Nando and D. Arnaud and I. J. Michael, “An Introduction to MCMC for Machine Learning,” Machine Learning, vol.50, pp. 5-43, 2003.
[14] B. S. Atal and L. Rabiner, “A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol.24, pp. 201-212, 1976.
[15] K. Igor, “Machine learning for medical diagnosis: history, state of the art and perspective,” Artificial Intelligence in Medicine, vol.23, pp. 89-109, 2001.
[16] I. Paik and E. Fujikawa and K. Sangkyung, “Aggregating Web Service matchmaking variants using web search engine and machine learning,” 2nd International Symposium on, Aware Computing, pp. 191-195, 2010.
[17] B. Biggio and G. Fumera and F. Roli, “Learning sparse kernel machines with biometric similarity functions for identity recognition,” IEEE Fifth International Conference on, Biometrics: Theory, Applications and Systems, pp. 325-330, 2012.
[18] T. Kinsman and M. Fairchild and J. Pelz, “Color is not a metric space implications for pattern recognition, machine learning, and computer vision,” Western New York, Image Processing Workshop, pp. 37-40, 2012.
[19] H. Chao and W. J. Liu, “Speaker adaptation of stochastic segment models using Maximum Likelihood Linear Regression,” 7th International Symposium on, Chinese Spoken Language Processing, pp. 119-122, 2010.
[20] B. Mak and T. J. Kwok and S. Ho, “Using kernel PCA to improve eigenvoice speaker adaptation,” International Conference on, Machine Learning and Cybernetics, vol.5, pp. 3062-3067, 2004.
[21] 張文杰, “模型調適之語者識別系統,” 中央大學碩士論文, 民國94年.
[22] 李信廷, “改善最小錯誤鑑別式之語者辨認方法,” 中央大學碩士論文, 民國95年.
[23] R. Saeidi and H. R. S. Mohammadi and R. D. Rodman, “Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models,” IEEE Transactions on Audio, Speech, and Language Processing, vol.17, pp. 344-353, 2009.
[24] X. Q. Su and X. L. Fu and C. Jian, “Design of Sound Recognition System Based on Modified Neural Network,” Applied Mechanics and Materials, vol.278-280, pp. 1178-1181, 2013.
[25] B. A. Laleh and M. N. Vahid, “Speech Enhancement Using Particle Swarm Optimization Techniques,” International Conference on Measuring Technology and Mechatronics Automation, vol.3, pp. 441-444, 2010.
[26] J. Gao and J. Y. Peng and Z. Li, “Application of Improved PSO-SVM Approach in Image Classification,” Symposium on Photonics and Optoelectronic, pp. 1-4, 2010.
[27] M. S. Kim and I. H. Yang and H. J. Yu, “Maximizing Distance between GMMs for Speaker Verification Using Particle Swarm Optimization,” Fourth International Conference on Natural Computation, vol.6, pp. 175-178, 2008.
[28] A. R. Douglas and F. Q. Thomas and B. D. Robert, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol.10, pp. 19-41, 2000.
[29] J. Kennedy and R. Eberhart, “Particle swarm optimization,” IEEE International Conference on Neural Networks, vol.4, pp. 1942-1948, 1995.
[30] Z. Caiqing and Z. Jinging and G. Xihua, “The Application of Hybrid Genetic Particle Swarm Optimization Algorithm in the Distribution Network Reconfigurations Multi-Objective Optimization,” Third International Conference on Natural Computation, vol.2, pp. 455-459, 2007.
[31] 賴易烽, “粒子群演算法應用於語者確認系統之研究,” 中央大學碩士論文, 民國101年.
[32] D. Y. Sha and C. Y. Hsu, “A hybrid particle swarm optimization for job shop scheduling problem,” Computers & Industrial Engineering, vol.51, pp. 791-808, 2006.
[33] Y. Shi and R. C. Eberhart, “Parameter Selection in Particle Swarm Optimization,” Evolutionary Programming VII. Lecture Notes in Computer Science, vol.1447, pp. 591–600, 1998.
[34] A. P. Dempster and N. M. Laird and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, vol.39, pp. 1-38, 1977.
[35] Y. Wu, “Based on Machine Learning of Data Mining to Further Explore,” International Conference on, Computer Science and Information Processing, pp. 1235-1238, 2012.
[36] M. T. Islam and M. Shaikh and A. Nayak and S. Ranganathan, “Extracting Biomarker Information Applying Natural Language Processing and Machine Learning,” 4th International Conference on, Bioinformatics and Biomedical Engineering, pp. 1-4, 2010.
[37] D. Ashlock and E. Warner, “Classifying synthetic and biological DNA sequences with side effect machines,” IEEE Symposium on, Computational Intelligence in Bioinformatics and Computational Biology, pp. 22-29, 2008.
[38] P. D. Yoo and M. H. Kim and T. Jan, “Financial Forecasting: Advanced Machine Learning Techniques in Stock Market Analysis,” IEEE INMIC 9th International Multitopic Conference, pp. 1-7, 2005.
[39] S. B. E. Raj and A. A. Portia, “Analysis on credit card fraud detection methods,” International Conference on, Computer, Communication and Electrical Technology, pp. 152-156, 2011.
[40] R. J. Mammone and Z. Xiaoyu and R. P. Ramachandran, “Robust speaker recognition: a feature-based approach,” Signal Processing Magazine, vol.13, 1996.
[41] R. Kuhn and P. Nguyen and N. Niedzielski, “Rapid speaker adaptation in eigenvoice space,” IEEE Transactions on Speech and Audio Processing, vol.8, pp. 695-707, 2000.
[42] The NIST Year 2001 Speaker Recognition Evaluation, Available at http://www.itl.nist.gov/iad/mig/tests/sre/2001/index.html.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔