臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.172) 您好！臺灣時間：2025/09/12 05:17

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

林冠良

研究生(外文):

Lin, Guan-Liang

論文名稱:

一個基於MFCC的語者識別系統

論文名稱(外文):

An MFCC-based Speaker Identification System

指導教授:

呂芳懌

指導教授(外文):

Leu, Fang-Yie

口試委員:

陳金鈴、楊伏夷、余心淳、劉榮春

口試日期:

2017-01-05

學位類別:

碩士

校院名稱:

東海大學

系所名稱:

資訊工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2017

畢業學年度:

105

語文別:

英文

論文頁數:

中文關鍵詞:

語者辨識、傅立葉轉換、梅爾頻率倒譜係數、高斯混合模型、聲學模型

外文關鍵詞:

speaker identification、Fourier transformation、Mel-frequency cepstral coefficients、Gaussian mixture model、acoustic model

相關次數:

被引用:1
點閱:556
評分:
下載:32
書目收藏:1

現今的環境中，語音辨識已經有許多生活上的實際應用，諸如iphone的語音助理SIRI、Google的語音輸入辨識系統與語音的手機操作等，語者辨識則相對地還未成熟。因此，本研究著重在語者辨識的研究，方法是將某人，例如，小明，的原始語音訊號，經傅立葉轉換，而從時域轉換到頻域上。其次，經由人耳聽覺模型過濾及分解出各個頻率的能量大小，並轉換成該語音的特徵數據。而後再利用高斯混合模型的機率密度函數，來概括描述這一些特徵數據的分布，進而成為小明的聲學模型。當系統接收到某一未知人物之語音數據時，亦以相同方法處理，並與資料庫中所蒐集之人物(包括小明)的聲學模型進行語音相似度比對，以辨識該未知人物可能是誰。

Nowadays, speech recognition has many practical applications which are currently used by people in the world. Typical examples are the SIRI of iPhone, Google speech recognition system, and mobile phones operated by voice, etc. On the contrary, speaker identification in its current stage is relatively immature. Therefore, in this paper, we study a speaker identification technique which first takes the original voice signals of a person, e.g., Bob. After that, the voice signals is converted from time domain to frequency domain by employing the Fourier transformation approach. A MFCC-based human auditory filtering model is then utilized to adjust the energy levels of different frequencies as the quantified characteristics of Bob’s voice. Next, the energies are normalized to the scales of logarithm as the feature of the voice signals. Further, the probability density function of Gaussian mixture model is employed to represent the distribution of the logarithmic characteristics as Bob’s specific acoustic model. When receiving an unknown person, e.g., x’s voice, the system processes the voice with the same procedure, and compares the processing result, which is x’s acoustic model, with known-people’s acoustic models collected in an acoustic-model database beforehand to identify who the most possible speaker is.

1. Introduction…………………………………………………………………… 1
2. Related Work…………………………………………………………………… 3
2.1 Voice Recognition System…………………………………………… 3
2.2 The Environmental Noise…………………………………………… 4
2.3 Operational Efficiency……………………………………………… 5
3. Background of this study………………………………………………………… 7
3.1 Feature Extraction………………………………………………… 7
3.2 Building Speaker Model…………………………………………… 9
3.2.1 Gaussian Mixture Model………………………………… 9
3.2.2 Training Phase……………………………………………… 11
3.3 The Speaker Identification Method………………………………… 11
4. The System Architecture……………………………………………………… 13
4.1 MFCC Process……………………………………………………… 13
4.2 Establishment of Gaussian Mixture Model………………………… 18
4.2.1 K-means clustering………………………………………… 18
4.2.2 EM algorithm……………………………………………… 20
4.3 Bhattacharyya Distance……………………………………………… 21
5. System Implementation and Evaluation……………………………………… 23
5.1 Experiment 1………………………………………………………… 25
5.2 Experiment 2………………………………………………………… 28
5.3 Experiment 3………………………………………………………… 30
5.4 Experiment 4………………………………………………………… 33
5.5 Experiment 5………………………………………………………… 35
6. Conclusion and Future studies………………………………………………… 40
References………………………………………………………………………… 42

[1] C. Zhan, W. Li and P. Ogunbona, “Face Recognition from Single Sample based on Human Face Perception,” International Conference Image and Vision Computing New Zealand, pp. 56-61, 2009.
[2] http://www.apple.com/tw/ios/siri/
[3] https://cloud.google.com/speech/
[4] D. A. Reynolds, “An overview of Automatic Speaker Recognition Technology,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 4, pp. 4072-4075, 2002.
[5] http://htk.eng.cam.ac.uk/
[6] http://kaldi-asr.org/doc/about.html
[7] L. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition,” in Prentice Hall, 1993.
[8] https://en.wikipedia.org/wiki/Speaker_recognition
[9] T. Stafylakis, M. J. Alam and P. Kenny, “Text-Dependent Speaker Recognition With Random Digit Strings,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, Issue. 7, pp. 1194-1203, July 2016.
[10] D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,” IEEE Transactions on Speech and Audio Processing, Vol. 3, Issue. 1, pp. 72-83, Jan 1995.
[11] http://www.d-ear.com/
[12] http://www.playrobot.com/speech-recognition/88-arduino-chinese-voice-recognition-module.html
[13] http://www.garmin.com.tw/m/buzz/tw/minisite/nuvi3790T/feature_02.htm
[14] J. S. Lim and A. V. Oppenheim, “Enhancement and Bandwidth Compression of Noisy Speech,” Proceedings of the IEEE, Vol. 67, Issue. 12, pp. 1586-1604, Dec. 1979.
[15] W. Zunjin and C. Zhigang, “Improved MFCC-based feature for robust speaker identification,” Tsinghua Science and Technology, Vol. 10, Issue. 2, pp. 158-161, April 2005.
[16] N. Cristianini and J. Shawe-Taylor, “Support Vector Machines,” in Cambridge University Press, 2000.
[17] B. H. Juang and T. Chen, “The Past, Present, and Future of Speech Processing,” IEEE Signal Processing Magazine, Vol. 15, Issue. 3, pp. 23-48, May 1998.
[18] N. E. Huang et al, “On Instantaneous Frequency,” in World Scientific Publishing Company, pp. 177-229, 2009.
[19] R. Vergin, D. O'Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Transactions on Speech and Audio Processing, Vol. 7, Issue. 5, pp. 525-532, Sep 1999.
[20] D. O'Shaughnessy, “Speech Communications: Human and Machine,” Wiley-IEEE Press, 1999.
[21] T. T. Soong, “Fundamentals of Probability and Statistics for Engineers,” Wiley, 2004.
[22] X. Peng, X. Wang and B. Wang, “Speaker Clustering via Novel seudo-Divergence of Gaussian Mixture Models,” International Conference on Natural Language Processing and Knowledge Engineering, pp. 111-114, 2005.
[23] http://www.datasciencelab.cn/clustering/gmm
[24] L. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition,” in Prentice Hall, pp. 215-219, 1993.
[25] A. Bhattacharyya, “On a Measure of Divergence between Two Statistical Populations,” in Springer on behalf of the Indian Statistical Institute, pp. 99–109, 1943.
[26] K. Rao and P. Yip, “Discrete Cosine Transform: Algorithms, Advantages, Applications,” in Academic Press, 1990.
[27] A. Goel and A. Gupta, “Design of Satellite Payload Filter Emulator Using Hamming Window,” International Conference on Medical Imaging, m-Health and Emerging Communication Systems (MedCom), pp. 202-205, 2014.
[28] J. O. Smith III, “Spectral Audio Signal Processing,” W3K Publishing, 2011.
[29] H. C. Ravichandar and A. P. Dani, “Human Intention Inference Using Expectation-Maximization Algorithm With Online Model Learning,” IEEE Transactions on Automation Science and Engineering, Vol. PP, Issue. 99, December 2016.
[30] http://www.sympy.org/en/index.html
[31] https://www.python.org/
[32] https://www.scipy.org/
[33] https://www.hdfgroup.org/
[34] J. P. Openshaw, Z. P. Sun and J. S. Mason, “A Comparison of Composite Features under Degraded Speech in Speaker Recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 371-374, 1993.
[35] A. Chaudhari, A. Rahulkar and S. B. Dhonde, “Combining dynamic features with MFCC for text-independent speaker identification,” International Conference on Information Processing (ICIP), pp. 160-164, 2015.
[36] http://www.oxfordlearnersdictionaries.com/us/about/pronunciation_english
[37] http://isrc.ccs.asia.edu.tw/www/essay/essay7/essay7-008.htm

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	高斯混合模型在語者辨識與國語語音辨認之應用
2.	基於小波轉換特徵參數以及使用麥克風和電話語料之大量語者識別系統
3.	強韌式語者辨識系統：從麥克風、市話到手機
4.	多時段不特定語句語者辨識用電視影音資料庫之設計研究
5.	高斯混合模型的學習與其在語者識別上的應用
6.	多重環境下語者辨識效能之評估與改進
7.	多媒體搜尋系統之設計研究
8.	使用高斯混合模型在200位語者之不固定語句之中文辨識系統
9.	特定語句與不特定語句語者辨識系統之設計研究
10.	語者辨別的研究
11.	應用小波包與線性預測編碼技術於車輛駕駛者辨識系統
12.	語音辨識應用於鼻炎輔助診斷之研究
13.	應用高斯混合模型局部調整學習於語者辨識之研究
14.	以語者辨識為基礎的主播認定之研究

無相關期刊

1.	利用類神經網路和梅爾頻率倒頻譜係數(MFCC)做語音情緒辨識
2.	應用於MFCC語音特徵參數擷取處理器之電路實現
3.	基於聲訊MFCC參數之資訊隱藏方法
4.	基於MFCCs的深層神經診斷，以及肺音藍芽聽診分析
5.	基於MFCC智慧辨識系統用於風扇品質檢測
6.	32位元處理器之定點數MFCC演算法的改進與探討
7.	使用卷積神經網路進行語者識別
8.	結合不同辨識器來改進文本相關語者識別
9.	製備氧化鋅奈米柱與濕潤特性分析
10.	以非督導式確認之遞增學習方式進行語者識別
11.	審計委員會專家類別的組成與盈餘品質之關聯
12.	舒伯特《阿貝鳩尼琴奏鳴曲》之研究
13.	利用梅爾倒頻譜與殘差網路辨識鳥鳴聲之結果分析
14.	MFCC語音特徵提取結合卷積神經網路之言語辱罵檢測系統
15.	機器人動剛性自動化量測技術開發

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室