跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.172) 您好!臺灣時間:2025/09/12 05:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡宗軒
研究生(外文):Zong-Syuan Tsai
論文名稱:長時間平均頻譜與語音內容分布的語者辨識系統
論文名稱(外文):Speaker Identification System based on Long Term Average Spectrum and Speech Content Distribution
指導教授:陳永耀陳永耀引用關係
指導教授(外文):Yung-Yaw Chen
口試委員:顏家鈺傅立成連豊力
口試日期:2012-07-26
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:100
中文關鍵詞:長時間平均頻譜頻譜合成語者辨識
外文關鍵詞:Long Term Average SpectrumSpectrum synthesisSpeaker identification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:229
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
音色是人類用以區分人聲差異性的重要特徵,因此如何描述音色的差異性也成為語者辨識領域中的重要課題。本研究致力於提出一個同時考慮長時間平均頻譜與語音內容分布的音色特徵,並應用此特徵於語者辨識系統上。
長時間平均頻譜是一個綜合內容影響和語者特性影響的特徵,所以長時間平均頻譜對同一位語者並不具有一致性的特徵型態。這也直接影響到利用長時間平均頻譜作為特徵選取的語者辨識系統的辨識率。
為了考量語音內容對長時間平均頻譜的影響以提高辨識率,本論文提出虛擬長時間平均頻譜的概念。首先,分析並找出具足夠影響力的中文發音音素,再依據這些音素建立相對應的平均頻譜並存放至語者資料庫。當未知的測試語音訊號輸入至系統時,系統會先辨識其語音內容,再依據此內容從語者資料庫合成出每一位語者相對應的虛擬的長時間平均頻譜。因為虛擬長時間平均頻譜與測試訊號的平均頻譜有著相同的發音內容,所以利用虛擬長時間平均頻譜作為語者辨識的決策基準,其效能也會較利用忽略內容影響的長時間平均頻譜來得高。
最後,本論文利用虛擬長時間平均頻譜得到了 94.2% 的語者辨識率。


Timbre is the important characteristic that human can distinguish the difference between each other by their voice. This thesis aims to give a feature of timbre that considers both Long Term Average Spectrum (LTAS) and speech content distribution and implements to speaker identification system.
LTAS is a feature influenced by both characteristics of speaker and content, so the same speaker still has inconsistent patterns of LTAS. The inconsistency of patterns also directly influences the accuracy of speaker identification using LTAS as feature.
To increase the accuracy by considering the effect of content, this thesis proposes the idea of Pseudo LTAS. All Taiwanese Mandarin phonemes are analyzed. Then the influential phonemes are chosen and their average spectra are derived as the components of speaker database. When the test speech signal is inputted, system recognizes its content and synthesizes the pseudo LTAS weighted by the content for individual. Because the contents of Pseudo LTAS and test speech signal are same, the accuracy of speaker identification using Pseudo LTAS as the decision pattern will be better than the one using LTAS which ignores the influence of content.
The accuracy of speaker identification system using Pseudo LTAS is 94.2 %.


Content
誌謝 ii
摘要 iii
Abstract iv
Content v
List of Figures viii
List of Tables xviii
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Problem Definition 4
1.3 Proposed Approach 8
1.4 Thesis Overview 10
Chapter 2 Speaker Identification 12
2.1 Speech Production 12
2.2 Architecture of Speaker Identification 17
2.2.1 Algorithm of Mel-Frequency Cepstral Coefficient 19
2.2.2 Long Term Average Spectrum 22
Chapter 3 Mandarin Phoneme Classification 34
3.1 Consonant Classification Mechanism 34
3.2 Vowel Classification Mechanism 37
3.3 Summary 38
Chapter 4 Speaker Identification using Pseudo Long Term Average Spectrum 40
4.1 Database 41
4.1.1 Training Data Design 41
4.1.2 Key Component Building 48
4.2 Pseudo Long Term Average Spectrum 50
4.2.1 Speech Recognition 51
4.2.2 Speech Phoneme Distribution 52
4.2.3 Weighted Average 54
4.3 Similarity Measurement 55
4.4 Summary 56
Chapter 5 Experimental Results 58
5.1 Phoneme Classification Result 60
5.1.1 Classification Results of Consonants 60
5.1.2 Classification Results of Vowel 72
5.1.3 Summary 76
5.2 Speaker Identification Accuracy 77
5.3 Comparison 88
5.4 Summary 89
Chapter 6 Conclusions and Future Work 91
Appendix A: Content of Training Data 93
Appendix B: Content of Testing Data 95
References 96


References
[1]Available: http://www.busim.ee.boun.edu.tr/~speech/Speaker_recognition.html
[2]L. Yu, "An Algorithm of Background Reconstruction Based on Morphology and PIC," in International Workshop on Information Security and Application, Qingdao, China, 2009.
[3]S. Furui and S. Saito, "Talker Recognition by Longtime Averaged Speech Spectrum," Electronics and Communications in Japan, vol. 55-A, 1972.
[4]T. F. Cleveland, J. Sundberg, and R. E. Stone, "Long-Term-Average Spectrum Characteristics of Country Singers During Speaking and Singing," Journal of Voice, vol. 15, pp. 54-60, 2001.
[5]D. Z. Borch and J. Sundberg, "Spectral distribution of solo voice and accompaniment in pop music," Logopedics Phoniatrics Vocology, vol. 27, pp. 37-41, 2002/01/01 2002.
[6]林展誼, "探討基於語音長時間平均頻譜之特性," 碩士, 電機工程學研究所, 臺灣大學, 台北市, 2011.
[7]R. Martin and P. Vary, Digital Speech Transmission: Enhancement, Coding and Error concealment: Wiley, 2006.

[8]J. Makhoul, "Linear Prediction:A Tutorial Review," Proceedings of the IEEE, vol. 63, 1975.
[9]T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Communication, vol. 52, pp. 12-40, 2010.
[10]J. P. Campbell, Jr., "Speaker recognition: a tutorial," Proceedings of the IEEE, vol. 85, pp. 1437-1462, 1997.
[11]H. Gish and M. Schmidt, "Text-independent speaker identification," Signal Processing Magazine, IEEE, vol. 11, pp. 18-32, 1994.
[12]S. Furui, "Research of individuality features in speech waves and automatic speaker recognition techniques," Speech Communication, vol. 5, pp. 183-197, 1986.
[13]Z. Tychtl and J. Psutka, "Speech production based on the mel-frequency cepstral coefficients," presented at the Sixth European Conference on Speech Communication and Technology, 1999.
[14]王小川, 語音訊號處理: 全華圖書股份有限公司, 2009.
[15]V. M. O. Barrichelo, R. J. Heuer, C. M. Dean, and R. T. Sataloff, "Comparison of Singer''s Formant, Speaker''s Ring, and LTA Spectrum Among Classical Singers and Untrained Normal Speakers," Journal of Voice, vol. 15, pp. 344-350, 2001.
[16]E. Mendoza, N. Valencia, J. Munoz, and H. Trujillo, "Differences in voice quality between men and women: Use of the long-term average spectrum (LTAS)," Journal of Voice, vol. 10, pp. 59-66, 1996.
[17]P. White, "Long-term average spectrum (LTAS) analysis of sex- and gender-related differences in children''s voices," Logopedics Phoniatrics Vocology, vol. 26, pp. 97-101, 2001.
[18]K. Tomi, A. Ville Hautam, and A. Pasi Fr, "On the Fusion of Dissimilarity-Based Classifiers for Speaker Identification," 2003.
[19]S. Pauk, "Use of Long-Term Average Spectrum for Automatic Speaker Recognition," Department of Computer Science, University of Joensuu, 2006.
[20]D. Sergeant and G. F. Welch, "Age-Related Changes in Long-Term Average Spectra of Children''s Voices," Journal of Voice, vol. 22, pp. 658-670, 2008.
[21]IBM ViaVoice. http://shopping.pchome.com.tw/?m=item&f=exhibit&IT_NO=AIAB08-A38040454&SR_NO=AIAB08
[22]L. Wang, X. Wang, and J. Feng, "Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition," Pattern Recognition, vol. 39, pp. 456-464, 2006.
[23]L. Fei, D. Qionghai, X. Wenli, and E. Guihua, "Weighted Subspace Distance and Its Applications to Object Recognition and Retrieval With Image Sets," Signal Processing Letters, IEEE, vol. 16, pp. 227-230, 2009.
[24]v. d. M. Freek, "The effectiveness of spectral similarity measures for the analysis of hyperspectral imagery," International Journal of Applied Earth Observation and Geoinformation, vol. 8, pp. 3-17, 2006.
[25]S. Nakagawa, W. Longbiao, and S. Ohtsuka, "Speaker Identification and Verification by Combining MFCC and Phase Information," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, pp. 1085-1095, 2012.
[26]M. Grimaldi and F. Cummins, "Speaker Identification Using Instantaneous Frequencies," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 16, pp. 1097-1111, 2008.
[27]A. Rosenberg and K. Shipley, "Talker recognition in tandem with talker-independent isolated word recognition," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 33, pp. 574-586, 1985.
[28]A. E. Rosenberg and F. K. Soong, "Evaluation of a vector quantization talker recognition system in text independent and text dependent modes," Computer Speech & Language, vol. 2, pp. 143-157, 1987.
[29]Y. Ariki, S. Tagashira, and M. Nishijima, "Speaker recognition and speaker normalization by projection to speaker subspace," in Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, 1996, pp. 319-322 vol. 1.
[30]K. Saeed and M. K. Nammous, "A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image," Industrial Electronics, IEEE Transactions on, vol. 54, pp. 887-897, 2007.
[31]V. R. Apsingekar and P. L. De Leon, "Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, pp. 848-853, 2009.



QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top