(3.235.108.188) 您好!臺灣時間:2021/02/25 08:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:余劉鴻
研究生(外文):Liu-Hung Yu
論文名稱:基於小波域特徵選擇之語者辨識
論文名稱(外文):Speaker Recognition Based on Feature Selection in Wavelet Domain
指導教授:陳文雄陳文雄引用關係
指導教授(外文):Wen-Shiung Chen
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:通訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:43
中文關鍵詞:語者辨識梅爾頻率小波轉換
外文關鍵詞:F-RatioMFCCWaveletGMM
相關次數:
  • 被引用被引用:0
  • 點閱點閱:239
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:29
  • 收藏至我的研究室書目清單書目收藏:1
語者辨識系統可以分三大部分:語音前處理、特徵萃取、分類辨識。特徵擷取的部分,我們使用傳統梅爾倒頻率參數(MFCC)與小波特徵;分類辨識的部分,利用高斯混合模型(GMM)的統計特性,測試語句代入混合模型計算出相似度。我們運用兩種特徵各自代入先前訓練好的模型,在相似度的分數上做結合,發現結合之後的辨識率比傳統MFCC來的好。我們還運用F-Ratio不同的角度來選擇鑑別性較高的特徵,減少特徵向量的維度,找出小波域中具有參考價值的特徵參數。
本實驗採用AURORA 2.0語音資料庫,其中包含男生52個類別,女生57個類別,每個人都有77句長短不一的數字串。在語者確認的實驗中,高斯混合數為32、MFCC維度為12、小波特徵參數維度為15的條件下,利用特徵選擇的方法降低小波特徵參數的維度,錯誤率從1.74%降到1.53%。在特徵參數的維度方面,系統效能在MFCC與小波特徵參數維度相近時有較佳表現。
Speaker recognition can be divided into three part:pre-processing, feature extraction, and pattern recognition. In feature extraction, we use the traditional features, Mel-frequency cepstral coefficients(MFCC), and wavelet coefficients. By using Gaussian mixture model(GMM)to calculate the similarity, we combine the output scores with MFCC and wavelet features. According to the experiments results, we find that the combining performance is better than we only use the traditional features, MFCC. In this paper, we focus on the feature selection in wavelet domain. We try to calculate the traditional F-ratio in different ways in order to find out the better features in wavelet domain.
Our speaker recognition is performed on the AURORA 2.0 database which contains 52-males and 57-females and the content are 77 different length clean digital series. In the speaker verification experiments, under the conditions of 32-GMM、12 dimensions of MFCC、15 dimensions of wavelet coefficients, we reduce the dimensions of the wavelet coefficients by feature selection. We find that the equal error rate decrease from 1.73% to 1.53%. In the dimensions of features, the system performance is better when the dimensions of wavelet coefficients is close to the dimensions of MFCC.
致謝 i
中文摘要 ii
ABSTRACT iii
目錄 iv
圖目錄 vii
表目錄 ix
第一章 緒論 1
1.1 生物辨識簡介 1
1.2 語者辨識技術發展與研究之簡述 2
1.3 研究動機與目的 5
1.4 語音資料庫 6
1.5 章節概要 6
第二章 語者識別基本技術 7
2.1 語音訊號前處理 7
2.1.1 端點偵測(End-point Detection) 7
2.1.2 預強調(Pre-emphasis) 8
2.1.3 取音框(Frame blocking) 9
2.1.4 漢明窗(Hamming windows) 9
2.2 特徵參數擷取 10
2.2.1 梅爾倒頻譜參數 10
2.3 語者模型 13
2.3.1 高斯混合模型 13
2.3.2 模型訓練與參數預估 14
2.4 語者辨識 17
2.4.1 語者識別 18
2.4.2 語者確認 19
2.4.3 門檻值選取 20
第三章 小波分析 22
3.1 視窗函數 22
3.2 積分小波轉換 24
3.3 離散小波轉換 25
第四章 特徵選擇 28
4.1 F_Ratio 28
4.2 改良式F_Ratio 29
第五章 實驗結果 31
5.1 實驗語料 31
5.2 實驗系統流程圖 32
5.3 語者確認 33
5.3.1 實驗(一):梅爾倒頻譜參數 33
5.3.2 實驗(二):Wavelet結合MFCC 34
5.3.3 實驗(三):小波特徵參數的選取 36
5.3.4 實驗(四):訓練句數的影響 38
5.4 語者識別 39
5.4.1 實驗(一):小波特徵參數的選擇 39
第六章 結論與未來期望 41
6.1 結論 41
6.2 未來期望 41
參考文獻 42
[1]J. Markel, B. Oshika and A. Gray, “Long-term feature averaging for speaker recognition,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. 25, no. 4, pp. 330-337, 1977.
[2]L. Rudasi and S. A. Zahorian, “Text-independent talker identification with neural networks,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 389-392, 1991.
[3]S. Furui, “Vector-quantization-based speech recognition and speaker recognition techniques,’’ Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 954-958, 1991.
[4]A. Vasuki and P. T. Vanathi, “A review of vector quantization techniques,’’ IEEE Potentials Mag., vol. 25, no. 4, pp. 39-47, 2006.
[5]G. Zhou and W. B. Mikhael, “Speaker identification based on adaptive discriminative vector quantization,’’ IEE Proceedings Vision, Image and Signal Processing, vol. 153, no. 6, pp. 754-760, 2006.
[6]G. Singh, A. Panda, S. Bhattacharyya and T. Srikanthan, “Vector quantization techniques for GMM based speaker verification,’’ Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 65-68, 2003.
[7]D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio Processing, vol. 3, no. 1, pp. 72-83, 1995.
[8]J. P. Campbell, “Speaker recognition : a tutorial,’’ Proceedings of the IEEE, vol. 85, no. 9, pp. 1437-1462, 1997.
[9]C. H. Lee, F. K. Soong and K. K. Paliwal, Automatic Speech and Speaker Recognition: Advanced Topics, Kluwer Academic Publishers, 1996.
[10]N. Badri, A. Benlahouar, C. Tadj and V. Ramachandran, “On the use of wavelet and Fourier transforms for speaker verification,’’ Midwest Symposium on Circuits and Systems, vol. 3, pp. 344-7, 2002.
[11]L. Brechet, M. Francoise, C. Doncarli and D. Farina, “Compression of Biomedical signals with mother wavelet optimization and best-basis wavelet packet selection,’’ IEEE Trans. on Biomedical Engineering, vol. 54, no. 12, 2007.
[12]O. Farooq and S. Datta, “Mel filter-like admissible wavelet packet structure for speech recognition,’’ IEEE Signal Processing Letters, vol. 8, no. 7, 2001.
[13]M. Pandit and J. Kittler, “Feature selection for a DTW-based speaker verification system,’’ Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 769-772, 1998.
[14]E. Thomas, K. Samuel and L. Chungyong, “An information-theoretic perspective on feature selection in speaker recognition,’’ IEEE Signal Processing Letters, vol. 12, no. 7, 2005.
[15]S. Goutam, S. Suman and C. Sandipan, “An F-ratio based optimization on noisy data for speaker recognition application,’’ Proceedings of the IEEE Indicon Conference, pp. 352-355, 2005.
[16]G. Saha, S. Chakraborty and S. Senapati, “An F-ratio based optimization technique for automatic speaker recognition system,’’ Proceedings of the IEEE Indicon Conference, pp. 70-73, 2004.
[17]L. Xugang and D. Jianwu,“An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification,”Proceedings of the International Speech Communication Association, vol. 50, no. 4, pp. 312-322, 2008.
[18]L. Jong-Hwan, J. Ho-Young, L. Te-Won and L. Soo-Young, “Speech feature extraction using independent component analysis,’’ Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 1631-1634, 2000.
[19]R. J. Mammone, X. Zhang and R. P. Ramachandran “Robust speaker recognition: A feature-based approach,’’ IEEE Signal Processing Mag., vol. 13, no. 5, pp. 58-71, 1996.
[20]Todd K. Moon, “The expectation-maximization algorithm,’’ IEEE Signal Processing Mag., vol. 13, no. 6, pp. 47-60, 1996.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔