臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.106) 您好！臺灣時間：2026/04/02 08:28

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

周文德

研究生(外文):

Wun-De Jhou

論文名稱:

以支援向量機為基礎之語者識別研究

論文名稱(外文):

Research of SVM-Based Speaker Identification

指導教授:

吳俊德

指導教授(外文):

Gin-Der Wu

學位類別:

碩士

校院名稱:

國立暨南國際大學

系所名稱:

電機工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2008

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

支援向量機、語者識別、梅爾倒頻譜參數、主成分分析、線性鑑別分析、高斯混合模型

外文關鍵詞:

support vector machine(SVM)、speaker identification、Mel-frequency cepstral coefficients(MFCC)、principal component analysis(PCA)、linear discriminant analysis(LDA)、Gaussian mixture model(GMM)

相關次數:

被引用:0
點閱:347
評分:
下載:74
書目收藏:0

此篇論文主要探討以支援向量機(support vector machine, SVM)訓練模型與其他訓練模型的方式比較，並以不同之強健方法提升語者識別系統中之辨識率。在此研究中，我們對於語料所採用的方式是直接對語料求取梅爾倒頻譜參數(Mel-frequency cepstral coefficients, MFCC )，作為語者分析的特徵參數。
然而，現實生活中的背景噪音卻可能大大的影響語者的辨識，例如：街道嘈雜聲、工廠作業聲…等。本篇論文將引用主成分分析(principal component analysis, PCA)、線性鑑別分析(linear discriminate analysis, LDA)強健語料之特徵參數，再利用SVM、高斯混合模型(Gaussian mixture model, GMM)等不同方法建立語者模型。接著我們利用此系統辨識語者，分別由20人(10男、10女)提供共4000個語音檔，每位語者唸中文數字(0-9)20次，每人選用160個音檔資訊作為參考音檔，其餘則作為測試音檔。在快速變動之背景噪音情況下測試，於不同強健、建模型之模式中可得其辨識率，最後再加以比較、討論。

The thesis is investigated into training models of support vector machine (SVM) to compare with another ways, and used different methods of enhancement to improve the performance in the speaker identification system. In the study, we used Mel-frequency cepstral coefficients (MFCC) to convert the speaker data as the features of speaker identification.
However, the noisy background in our life may interfere with the performance, such as noise on the streets, factories, and so on. The thesis will employ principal component analysis (PCA) and linear discriminant analysis (LDA) to enhance speaker features, then using SVM and Gaussian mixture model (GMM) to set up speaker models. Next, we used the system to identify the speakers. We adopted numbers in Chinese (0-9) from 20 speakers (10 males and 10 females), then everyone chanted 20 times for each number (total files: 4000). We selected 160 files of each one as the training file, the remainder as the testing files.
Finally, we compared and discussed the results which are tested in several variable background noises form different conditions.

Acknowledgments.............................................i
Abstract in Chinese........................................ii
Abstract in English.......................................iii
Contentsi...................................................v
List of figures............................................vi
List of tables............................................vii
Chapter 1 Introduction......................................1
1.1 Motivation..............................................1
1.2 Overview of Speaker Recognition.........................1
1.3 Thesis Organization.....................................3
Chapter 2 The Basic Technologies of Speaker Identification..4
2.1 Introduction............................................4
2.2 Feature Extraction......................................5
2.2.1 Pre-emphasis..........................................6
2.2.2 Frame Blocking........................................6
2.2.3 Windowing.............................................7
2.2.4 Fast Fourier Transform................................8
2.2.5 Triangular Bandpass Filter............................9
2.2.6 Logarithm Transform and Discrete Cosine Transform....10
2.2.7 Energy...............................................10
2.3 Speaker Model..........................................11
2.3.1 K-means Clustering Algorithm.........................11
2.3.2 Model Describe.......................................12
2.3.3 Model Parameter Estimation...........................14
2.3.4 Speaker Recognition..................................15
Chapter 3 Support Vector Machines..........................17
3.1 Introduction...........................................17
3.2 Linear Classifier......................................17
3.3 Non-separable Case.....................................20
3.4 Non-linear Classifier..................................21
3.5 Multi-class Classification.............................23
Chapter 4 Robustness Technologies..........................26
4.1 Temporal Filter........................................26
4.2 Principal Component Analysis Temporal Filter...........27
4.3 Linear Discriminant Analysis Temporal Filter...........28
Chapter 5 Experiments and Results..........................31
5.1 System Specification...................................31
5.2 Basic Experiments and Results..........................32
5.2.1 The Experiments in the GMM...........................32
5.2.2 The Experiments in the SVM...........................34
5.3 Robust Techniques Effect for Speaker Identification....35
5.4 Noisy Experiments and Results..........................37
5.4.1 Noisy Experiments in the GMM.........................37
5.4.2 Noisy Experiments in the SVM.........................39
Chapter 6 Conclusions and Future Work......................41
Bibliography...............................................42

[1]D. A. Reynolds, “An overview of automatic speaker recognition technology” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on, Vol. 4, 2002.
[2]R. J. Mammone, Z. Xiaoyu, and R. P. Ramachandran, “Robust speaker recognition: a feature-based approach” Signal Processing Magazine, IEEE, Vol. 13, 1996.
[3]D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models” Speech and Audio Processing, IEEE Transactions on, Vol. 3, Issue 1, pp72-83, Jan. 1995.
[4]J. J. Webb and E. L. Rissanen, “Speaker identification experiments using HMMs,” Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, Vol. 2 , 27-30 Apr 1993.
[5]H. Seddik, A. Rahmouni, and M. Sayadi, “Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier” Control, Communications and Signal Processing, 2004. First International Symposium on, 2004.
[6]G. Singh, A. Panda, S. Bhattacharyya, and T. Srikanthan, “Vector quantization techniques for GMM based speaker verification” Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on, Vol. 2, 6-10 April 2003
[7]D. Tran and M. Wagner, “A robust clustering approach to fuzzy Gaussian mixture models for speaker identification” Knowledge-Based Intelligent Information Engineering Systems, 1999. Third International Conference, 31 Aug.-1 Sept. 1999
[8]T. K. Moon, “The expectation-maximization algorithm” Signal Processing Magazine, IEEE, Vol. 13, Nov. 1996
[9]J. K. Sing, D. K. Basu, M. Nasipuri, and M. Kundu; “Improved k-means algorithm in the design of RBF neural networks” TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, Vol. 2, 15-17 Oct. 2003
[10]F. Moreno-Seco, L. Mico, and J. Oncina; “A new classification rule based on nearest neighbour search” Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 4, 23-26 Aug. 2004
[11]F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines” IEEE Trans. Geosci. and Remote Sens., Vol. 42, No. 8, pp. 1778-1790, Aug. 2004.
[12]C. H. Wu, G.. H. Tzeng, Y. J. Goo, and W. C. Fang, “A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy” ESWA 1584, 10 January 2006.
[13]G. Guo and S. Z. Li, “Content-based audio classification and retrieval by support vector machines” Transactions on neural networks, IEEE, Vol. 14, No. 1, January 2003.
[14]F. Schwenker, “Hierarchical support vector machines for multi-class pattern recognition” Knowledge-Based Intelligent Engineering Systems and Allied Technologies, 2000. Proceedings. Fourth International Conference on, Vol. 2, 30 Aug.-1 Sept. 2000.
[15]J. C. Wang, C. H. Yang, J. F. Wang, and H. P. Lee, “Robust speaker identification and verification” Computational Intelligence Magazine, IEEE Vol. 2, Issue 2, May 2007.
[16]C. C. Lin, S. H. Chen, T. K. Truong, and Y. Chang, “Audio classification and categorization based on wavelets and support vector machine” IEEE Transactions on speech and audio processing, Vol. 13, No. 5, September 2005.
[17]J. W. Hung and L. S. Lee, “Optimization of temporal filters for constructing robust features in speech recognition” IEEE Transaction on audio, speech, and language processing, Vol.14, No. 3, May 2006.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於小波轉換特徵參數以及使用麥克風和電話語料之大量語者識別系統
2.	應用SOM-PNN混合神經網路在語者識別
3.	以音節為基礎之語者識別
4.	強韌式語者辨識系統：從麥克風、市話到手機
5.	多類支持向量機混合LDA演算法之人臉辨識研究
6.	高斯混合模型的學習與其在語者識別上的應用
7.	雜訊環境下強健性語者辨認的新方法
8.	結合支向機技術發展之自動人臉辨識系統
9.	基於高斯混和模型之語者辨認
10.	結合資料前處理技術、支援向量機(迴歸)與演化式演算法於TFT-LCD製程之分析與應用
11.	以線性識別分析為基礎之二維倒頻譜語音辨識研究
12.	督導式與非督導式曲風分類
13.	用於人與機器人之間互動的即時視覺人臉追蹤與辨識技術
14.	利用軌跡特徵分析行人異常行為
15.	地面熱像目標辨識信心度之研究

1.	丘昌泰. (2004). 從各國電子投票經驗看我國選務的改革方向. 研考雙月刊, 28(4), 25-35.
2.	周建新, &; 林宗得. (2005). 資訊透明度對企業價值增額解釋能力之研究. 會計與公司治理, 2(第 2), 25-46.
3.	林宛瑩, &; 許崇源. (2008). 台灣集團企業之控股型態及公司治理衡量指標之研究與建議. 交大管理學報, 28(1), 269-312.
4.	洪秀芬. (2010). 公司表決權制度之變革公司法修正草案關於電子投票及董監事選舉制度之評析. 月旦法學雜誌, 201006(181期), 63-80.
5.	陳冠宙, 陳育成, &; 陳雪如. (2005). 影響上市公司網站資訊透明度因子之實證. 會計與公司治理, 2(1), 33-59.
6.	陳瑞斌, &; 許崇源. (2007). 公司治理結構與資訊揭露之關聯性研究. 交大管理學報, 27(2), 55-109.
7.	葉銀華. (2002). 從臺灣上市公司網站資訊揭露看透明度. 會計研究月刊, 200206(200期), 70-77.
8.	葉銀華, 邱顯比, &; 何憲章. (1997). 利益輸送代理問題與股權結構之理論與實證研究. 中國財務學刊, 第四卷第四期, 47-73.

1.	可調變式倒傳遞類神經網路實現於可程式化邏輯閘陣列
2.	基於可調變式快速傅立葉轉換之梅爾倒頻譜參數晶片設計
3.	低成本除法器實現於可程式化邏輯閘陣列
4.	雙基底對數架構之數位訊號處理器晶片設計
5.	語音訊號補償在不同模型中之探討
6.	應用於語音辨識之梅爾倒頻譜參數超大型積體電路設計
7.	基於鑑別度最佳化之TS型模糊類神經網路應用於分類問題
8.	低功率可調變式快速傅立葉轉換處理器之超大積體電路設計
9.	強健性語音辨識中能量相關特徵之改良式正規化技術的研究
10.	基於頻譜維度之語者辨識
11.	基於適應性結構式小波樹量化之強韌影像浮水印技術
12.	門檻值去噪法於調變頻譜之強健性語音辨識研究
13.	基於平行處理架構之模糊類神經網路晶片設計及其應用於圖形識別
14.	基於雙算術邏輯運算處理器之模擬器設計
15.	基於MVC架構之雲端平台設計

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室