跳到主要內容

臺灣博碩士論文加值系統

(3.236.23.193) 您好!臺灣時間:2021/07/24 13:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:江宗憲
研究生(外文):Tung-Hsien Chiang
論文名稱:基於離散小波轉換之語者辨識
論文名稱(外文):Discrete Wavelet Transform Based Speaker Recognition
指導教授:吳俊德吳俊德引用關係
指導教授(外文):Gin-Der Wu
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:英文
論文頁數:65
中文關鍵詞:語者辨識小波轉換音框分頻梅爾倒頻譜
外文關鍵詞:speaker recognitionwavelet transformframeclassified frequencyMel-frequency cepstral coefficients (MFCC)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:195
  • 評分評分:
  • 下載下載:48
  • 收藏至我的研究室書目清單書目收藏:0
此篇論文提出基於離散小波轉換(Discrete Wavelet Transform)之語者辨識系統。近年來在很多語者辨識的相關研究上,對於訓練語料和測試語料,一般我們所採用的方式都是直接對語料求取倒頻譜梅爾參數(Mel-frequency cepstral coefficients, MFCC ),作為語音分析的特徵參數。然而現實生活中的背景噪音卻可能大大的影響語者的辨識,例如:移動聲、引擎運轉聲、速度改變聲、踩煞車聲、撞擊聲等等。本論文將引入小波轉換技術。藉由小波轉換,我們將語料的音框經過分頻的拆解,並對所選定之頻帶個別求取能量的特徵參數且建立語者模型。此基於小波轉換之語者辨識系統將能更精確的求取語者的特徵參數並且建立更可靠的語者模型。最後,我們利用此一系統辨識語者,分別由10人提供2590個語音檔,每位語者唸中文數字(0-9)數次,每人選用大約200個音檔資訊作為參考音檔,其餘的音檔則作為測試音檔,而實驗結果,在快速變動背景噪音等級情況下作測試,藉由小波轉換的語者辨識系統比傳統的方法,辨識率提高約2%~5%左右。
This thesis was proposed under the basis of the wavelet transform of speaker recognition system. In recent years, the relevant research for speaker feature that we usually adopted Mel-frequency cepstral coefficients (MFCC). However, the background of noise may affect the recognition of speakers in real life, such as the voice of movement, engine running, speed change, braking, percussion, and so on. This thesis will introduce the technology of wavelet transform. By the way of wavelet transform, we will disassemble the frame through classified frequency. Besides, we will also calculate the characteristic parameter which with individual frequency band and establish the model of speakers. And we will set up more reliable speaker model. In the experiment of speaker recognition, we adopt Chinese digit (0-9) words form 10 speakers. And everyone speaks 20 times or so. Finally, this thesis will be tested under the condition of variable background noise level. We adopt each person of 200 files or so as reference data, others are used as test data. We find that the wavelet transform of speaker recognition system can be more effective and efficient than the traditional speaker recognition. The recognition rate can improve 2%~5% or so.
Acknowledgments
Abstract in Chinese i
Abstract in English ii
Contents iii
List of figures v
List of tables vii
1. Introduction 1
1.1 Motivation 1
1.2 Introduction of speaker recognition 1
1.3 Introduction of Relative technology for speaker recognition 5
1.4 Thesis organization 5
2. Review Speaker Identification Recognition 7
2.1 Speaker Feature Extraction 8
2.1.1 Pre-emphasis 9
2.1.2 Framing 9
2.1.3 Windowing 10
2.1.4 Fast Fourier Transform(FFT) 11
2.1.5 Mel-Frequency cepstral coefficient(MFCC) 12
2.1.6 Logarithm Transform and Discrete Cosine Transform (DCT) 13
2.1.7 Energy 14
2.2 Recognition algorithm─Gaussian Mixture Model 14
2.2.1 K-means 15
2.2.2 Model describe 16
2.2.3 Model Parameter Estimation 17
2.2.4 Speaker Recognition 18
3. Relative technologies for speaker recognition 20
3.1 Wavelet Transform 20
3.2 Temporal filter 24
3.2.1 Principal Component Analysis (PCA) 26
4. Experiments and Results 29
4.1 System specification 30
4.2 Basic experiments and results 31
4.2.1 Gaussian mixture experiment and result 31
4.2.2 Basic DWT experiment and result 36
4.2.3 Basic PCA experiment and result 38
4.3 Noisy experiments and results 41
4.3.1 Noisy DWT experiment for two orders and result 41
4.3.2 Noisy DWT experiment for other orders and result 45
4.3.3 Noisy DWT and PCA experiment and result 49
4.4 Conclusions and results 52
5. Conclusions 54
Bibliography 55
[1]Reynolds, D.A., “An overview of automatic speaker recognition technology,” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP ''02). IEEE International Conference on , Volume: 4 , 2002.
[2]Changwoo Seo, Ki Yong Lee and Joohun Lee “GMM based on local PCA for speaker identification” Electronics Lettrs 22nd November 2001 Vol. 37 No. 24
[3]R. J. Mammone, X. Zhang. and R. P. Ramachandran. “Robust speaker recognition : A feature-based approach” IEEE Signal Processing Mag., vol. 13, pp.r58-71, 1996
[4]R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Mag., vol. 13, pp. 58-71, 1996.
[5]Z. X. Yuan, B. L. Xu, and C. Z.Yu, “Binary quantization of feature vectors for robust text-independent speaker identification” IEEE Tran. of Speech and Audio Processing, vol. 7, no. 1, Jan 1990.
[6]G. Velius, “Variants of cepstrum based speaker identify verification”, in Proc. ICASSP, pp. 583-586, 1988.
[7]Yegnanarayana, B.; Prasanna, S.R.M.; Zachariah, J.M.; Gupta, C.S.;” Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system” Volume 2, 7-11 June 1992
[8]Joseph P. Campbell and JR,”Speaker Recognition: A Tutorial,” Proc. Of the IEEE, vol.85, no.9, Sept 1997, pp. 1437-1462
[9]L. Rudasi and S. A. Zahorian,“Test-independent talker identification with neural networks,”in Proc. IEEE ICASSP, May 1991, pp. 389-392.
[10]H. Gish, N. Schmidt, R. Schwartz, “Text-independent speaker identification”, IEEE Signal Processing Magazine, pp18-21, Oct.1994
[11]D. Reynolds and R. Rose, “Robust test-independent speaker identify-cation using Gaussian Mixture Speaker Models,”IEEE Transactions on Speecn and Audio Processing, Vol.3,No.1, 1995.
[12]S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE transactions on acoustics,speech and signal processing, Febuary 1986.
[13]G. Strang, T. Nguyen,”Wavelets and Filter Banks”, Wellesley-Cambridge Press,New York, 1996.
[14]Bovbel, E.I.; Kheidorov, I.E.; Chaikou, Y.A., “Wavelet-based speaker identification,”Digital Signal Processing, 2002. DSP 2002. 2002 14th International Conference on , Volume: 2 , 2002
[15]J-W. Hung, L.S. Lee “Comparative Analysis for Data-Driven Temporal Filters Obtained Via Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) In Speech Recognition”, Eurospeech 2001.
[16]N-C Wang, J-W. Hung, L.S. Lee, “Data –Driven Temporal Filters Based on Multi-Eigenvector for Robust Features in Speech Recognition”,ICASSP 2003
[17]Hermansky, H. and Morgan, N,“ RASTA processing of speech” Speech and Audio Processing, IEEE Transactions on, Volume 2, Issue 4, Oct. 1994
[18]Garcia, A.A.; Mammone, R.J.,” Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping” Acoustics, Speech, and Signal Processing, 1999. ICASSP '99. Proceedings, 1999 IEEE International Conference on, Volume 1, 15-19 March 1999 Page(s):325 - 328 vol.1.
[19]J. B. Allen, “Cochlear modeling,” IEEE Acoust., Speech, Signal Processing Mag., vol. 2, pp. 3–29, 1985.
[20]D. O’Shaughnessy, Speech Communication. Reading, MA: Addision-Wesley, 1987, p. 150
[21]Jim C. Bezdek, "Fuzzy mathematics In pattern classfication", PhD thesis, Applied Math. Center, Cornell University,Ithaca, 1973.
[22]Todd K. Moon, “The Expectation-Maximization Algorithm,” IEEE Singal Processing Magazine, Nov.1966.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. [9] 梁卓中、夏曉文、任展勇、陳弘文,“傳統圓筒加進型壓力殼與多球型加勁壓力殼結構強度比較之研究”中國造船暨輪機工程學刊,第二十三卷第三期,pp.125-141(2004)。
2. [8] 梁卓中、夏曉文、任展勇、陳弘文,“深潛多球加勁型壓力殼結構最佳化設計之探討”,中國造船暨輪機工程學刊,第二十二卷,pp.159-172(2003)。
3. [1] 李雅榮、俞君俠,“夾心板結構挫曲強度之探討”,中國機械工程學刊,第七卷第一期,pp.41-51(1986)。
4. [12] 戴毓修、毛世威、徐慶瑜“加勁平板承受水下爆炸之動態反應分析”中國造船暨輪機工程學刊,第二十五卷第一期,pp.35-46 (2006)。
5. 于第、王秀惠(2002)。技術學院學生網路使用行為之調查研究–以景文技術學院為例。景文技術學院學報,13,1-17。
6. 張景媛(1990)。後設認知能力與資優教育。資優教育,34,6-9。
7. 郭靜姿(1994)。不同閱讀能力學生成敗歸因方式、策略運用與後設認知能力之差異比較。師大學報,39,284-325。
8. 陳李綢(1988)。學習策略的研究與教學。資優教育季刊,29,15-24。
9. 陳李綢(1990)。近代後設認知理論的發展與研究趨勢。資優教育,37,9-12。
10. 陳密桃(1990)。兒童和青少年後設認知的發展及其教學效果之分析。教育學刊,9,107-148。
11. 黃慕萱(2002)。國小學生的資訊需求研究。中國圖書館學會會報,69,1-11。
12. 楊宗仁(1991)。後設認知的源起及其理論。資優教育,38,16-25。
13. 詹文宏(1993)。Sternberg 智力三元論與後設認知之關係。資優教育季刊,48,21-23。
14. 魏麗敏(1995)。後設認之學習理論與策略。學生輔導,38,66-75。