跳到主要內容

臺灣博碩士論文加值系統

(44.192.254.173) 您好!臺灣時間:2023/10/02 06:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃雪珠
研究生(外文):Huang Hsueh Chu
論文名稱:基於小波轉換之語者識別分析
論文名稱(外文):A Study of Speaker Identification Based on Wavelet Transform
指導教授:王昭男
指導教授(外文):Wang Chao Nan
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:工程科學與海洋工程學系
學門:工程學門
學類:綜合工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:92
語文別:中文
論文頁數:70
中文關鍵詞:小波分析梅爾刻度式倒頻譜參數向量量化識別
外文關鍵詞:Wavelet analysisMel Frequency Cepstrum Coefficient (MFCC)Vector Quantization (VQ)Identification
相關次數:
  • 被引用被引用:6
  • 點閱點閱:293
  • 評分評分:
  • 下載下載:51
  • 收藏至我的研究室書目清單書目收藏:1
本文主要是利用人的聲紋作身份辨識,影響人的聲音變化有許多因素,例如環境、年紀、性別等,因此特徵參數的抽取要能顯示出一個人獨有的聲音特質。本文在抽取特徵參數的演算過程中,將小波分析加入常見的梅爾刻度式倒頻譜參數作為特徵演算的方式,小波分析具有在時間-頻率域上的多解析度特性,因此可以針對不同的頻帶做適當的分析。
本文首先利用小波分析的特性對電話按鍵音和樂器音做辨識,電話按鍵音和樂器音為規律的訊號,所以可以得到不錯的辨識結果,而人的聲音變化很大,會受許多因素影響,本文主要是著重在語者識別(Speaker identification)的部分。語者識別是藉由分析人的聲音達到識別語者身分的目的,本文所要探討的語者辨識為字獨立(Text independent)的辨識。
本文結合了多解析度的小波分析和常用的梅爾倒頻譜參數(Mel frequency cepstrum coefficients, MFCC)作為語者特徵演算的方法,並藉由向量量化(Vector Quantization ,VQ)的方式將特徵向量進行編碼,建立語者的資料庫,而在辨識時,將輸入的訊號和資料庫做VQ誤差比對,找出最小的偏差值,定為辨識的結果。實驗是以12位語者的語音資料作訓練及辨識工作,對於各項條件最佳之狀態下,可得出92.91%的辨識率。最後針對一些可以改進辨識率的方法提供建議。
The main aim of this thesis is to identify a person utilizing the characteristics of his voice. The human speech conveys different types of information, such as the meaning of words, emotion, age etc. Therefore, the feature extracted from the voice signal must be able to represent the personal characteristics of the speaker exactly. Since the wavelet analysis is suitable for analyzing critical bands because of its multi resolution property in time-frequency domain. In the process of extracting feature parameters, we use wavelet transform to separate the signal into different frequency bands and then use the Mel Frequency Cepstrum Coefficients (MFCC) method to conduct the feature extraction procedure.
First of all, we analyzed a touch-tone phone, musical instruments based on wavelet transform. A touch-tone phone and musical instruments are regular signals so we could get high identification rate. Then we looked into the details of text independent speaker identification.
We applied the wavelet transform and MFCC method on the speech signals to establish the speech feature vector of the speaker. Then we modeled the extracted characteristics by vector quantization (VQ) approach to reduce the information amount. In the identification step, the extracted features were compared against the models stored in the speaker database. In our system, we trained 12 speakers and obtained 92.91% recognition rates. Finally, we finished this work by giving short discussion and conclusions.
第一章 緒論
1.1 研究動機
1.2 前人研究與研究方向
1.3 章節概要
第二章 小波簡介
2.1 視窗函數
2.2 積分小波轉換
2.3 離散小波轉換
2.4 多解析空間
第三章 電話按鍵音與樂器音分析
3.1 電話按鍵音
3.1.1 簡介
3.1.2 小波分析
3.2 樂器音
3.2.1 音色、頻率和音高
3.2.2 基音與泛音
3.2.3 小波分析
3.3 訊號錄製與分析結果
第四章 語者識別
4.1 語者識別原理
4.2 語者特徵參數擷取
4.2.1 語音訊號的切割
4.2.2 小波分析
4.2.3 梅爾倒頻譜參數
4.2.3.1 快速傅立業轉換
4.2.3.2 梅爾頻率轉換
4.2.3.3 倒頻譜參數
4.3 語者特徵參數比對
4.3.1 向量量化簡介
4.3.2 語者特徵向量編碼
4.3.3 語者識別
4.4 語者識別系統設計
4.4.1 語音資料庫
4.4.2 系統參數
4.5 辨識結果
4.5.1 使用向量量化的辨識率
4.5.2 訓練語句長度和辨識率的關係
4.5.3 測試語句長度和辨識率的關係
4.5.4 小波階數和辨識率的關係
4.6 討論
第五章 結論及未來研究方向
參考文獻
附錄A 梅爾刻度式倒頻譜參數
1. R. Peacocke, D. Graf, “ An introduction to speech and
speaker recognition, ” IEEE Computer, 1990, pp26-33.
2. Joseph P. Campbell and JR, ”Speaker Recognition : A
Tutorial,” Proc. Of the IEEE, vol.85, no.9, Sept 1997, pp.
1437-1462
3. B. S. Atal, “Automatic recognition of speakers from their
voices,” Proc. IEEE, vol. 64, 1976, 460-475.
4. G. R. Doddington, “Speaker recognition- Identifying people
by their voices,” Proc. IEEE, vol. 73, Nov. 1985, pp. 1651- 1664.
5. Furui, S., “Speaker-dependent-feature extraction,
recognition and processing techniques,” Speech
Communication, vol. 10, 1991, pp. 505-520.
6. D. O’Shaughnessy, “Speech communication, human and
machine,” Digital Signal Processing. Reading, MA: Addison-
Wesley, 1987.
7. A. Rosenberg, “Automatic speaker verification: A review,”
Proc. IEEE, vol. 64, Apr. 1976, pp. 475-487.
8. A. E. Rosenberg and F. K. Soong, ”Recent research in
automatic speaker recognition,” in Advances in Speech
Signal Processing, S. Furui and M. M. Sondhi, Eds. New
York: Marcel Dekker, 1992, pp. 701-738.
9. A. Sutherland and M. Jack, “Speaker verification, ” in
Aspects of Speech Technology, M. Jack and J. Laver, Eds.
Edinburgh, Scotland: Edinburgh Univ. Press, 1998, pp. 185-
215.
10. G.. R. Doddington, ”A computer method of speaker
verification,” Ph.D. Dissertation, Dep. Elec. Eng.,
University of Wisconsin, 1970.
11. R. C. Lummis, “Speaker verification by computer using
speech intensity for temporal registration,” IEEE Trans.
Audio Electroacoust., vol.AU-21, Apr. 1973, pp. 80-89.
12. A. E. Rosenberg and M. R. Sambur, “New techniques for
automatic speaker verification,” IEEE Trans. Acoust.,
Speech, Signal Processing, vol. ASSP-23, Apr. 1975, pp.
169-176.
13. A. E. Rosenberg, “Evaluation of an automatic speaker-
verification system over telephone lines,” Bell Syst.
Tech. J., vol. 55, July-Aug, 1976, pp. 723-744.
14. S. Furui, "Cepstral analysis technique for automatic
speaker verification ," IEEE Trans. Acoust., Speech,
Signal Processing, vol. ASSP-29, no. 2, Apr. 1981, pp. 254-
272.
15. Juergen Luettin, Neil A. Thacker, Steve W. Beet, ”Speaker
Identification by lipreading,” Proceedings of the 4th
International Conference on Spoken Language Processing
(ICSLP''96), 1996.
16. F. Soong et al., “A vector quantization approach to
speaker recognition,” in Proc. IEEE ICASSP, 1985, pp.387-
390.
17. Aritaeeinia. A. M. & Sivakumaran. P. “Comparison of VQ
and DTW classifiers for speaker verification,” Security
and Detection, 1997. ECOS 97., European Conference
18. Douglas A. Reynolds, Richard C. Rose, “Robust text-
independent speaker identification using Gaussian mixture
speaker models”, IEEE Transactions Speech and Audio
Processing, vol. 3, no1, January 1995, pp72-83.
19. 許世俊,”用於高斯混合模型語者辨認之區別式訓練方法”,國立清
華大學碩士論文,中華民國八十五年六月
20. G. Strang, T. Nguyen, ”Wavelets and Filter Banks”,
Wellesley-Cambridge Press, New York, 1996.
21. C.T. Hsieh, Eugene Lai and Y.C Wang, "Robust Speaker
Identification System Based on Wavelet Transform and
Gaussian Mixture Model", Journal of Information Science
and Engineering. 19, 2003.
22. C.T. Hsieh, Eugene Lai and Y.C Wang, "Robust Speech
Features based on Wavelet Transform with application to
speaker identification", IEEE Proceedings - Vision, Image
and Signal Processing, Vol. 149, No. 2, 2002, pp.108-114.
23. C.T. Hsieh and Y.C Wang, "A Robust Speaker Identification
System Based on Wavelet Transform", Trans. IEICE on
Information and Systems, Vol. E84-D, No.7, 2001, pp.839-
846.
24. Jaideva C. Goswami and Andrew K. Chan, “Fundamentals of
Wavelets: Theory, Algorithm, and Applications,” New York:
John Wiley & Sons, 1999.
25. G. B. Arfken and H. J. Weber, “Mathematical methods for
physicists,” San Diego: Academic Press, 1995
26. Charles K. Chui, “An Introduction to Wavelets,” Academic
Press, Boston, 1992.
27. KrisJon Hanson and Dong Zhao, ”Touch-Tone Telephone
Monitor: final project report,” Internet Copy: http: //
www.rose-hulman.edu/Library/dzhao/tonemonitor. html, May,
1998
28. 謝寧翻譯, 莊本立校閱, “音樂的科學原理,” 徐氏基金會, 1970.
29. 張仁昌, “聲波的波形與頻率的關係,” 物理教學示範實驗教室,
2002.
30. 蘇柏青, “整合式音樂識譜系統,” 碩士論文, 2001.
31. Evgeny Karpov, “real time speaker identification,”
University of Joensuu, Department of Computer Science,
Master’s thesis, 2003.
32. D. O’Shaughnessy, “Speaker recognition,” IEEE ASSP
Mag., pp.4-7, Oct. 1986.
33. G. Doddington, “Speaker recognition- Identifying people
by their voices,” Proc. IEEE, vol.73, 1985, pp. 1651-1664.
34. Naik. J. “Speaker Verification: A Tutorial,” IEEE
Communications Magazine, January 1990, pp. 42-48.
35. Milan Sigmund, “Speaker recognition: Identifying people
by their voices,” Brno University of Technology,
Habilitation thesis, 2000.
36. H.Gish.and M. Schmidt, ”Text Independent Speaker
Identification,” IEEE Signal Processing Magazine, vol.
11, No. 4, 1994, pp.18-32.
37. J. R. Deller, J. H. L. Hansen, J. G.. Proakis, ”Discrete
Time Processing of Speech Signals,” Piscataway(N. J. ),
IEEE Press, 2000.
38. Christian Cornaz and Urs Hunkeler, ”Digital signal
processing Mini-Project: An Automatic Speaker
Recognition,”
http://icwww.epfl.ch/~humkeler/dsp/minipro2. pdf,
February, 2003.
39. Minh N.Do, ” Digital signal processing Mini-Project: An
Automatic Speaker Recognition system,”
http://lcavwww.epfl.ch/~minhdo/asr_project.html, 2002.
40. 邱明達, “用類神經網路做國語語音的辨認,” 碩士論文, 1994.
41. Ewa Bielinska, ” Speaker Identification”, AI-METH 2002-
Artificial Intelligence Methods, November 13-15, 2002.
42. 楊璧如, “語者歌者識別,” 碩士論文, 2000.
43. G. Strang, T. Nguyen, ”Wavelets and Filter Banks,”
Wellesley-Cambridge Press, New York, 1996.
44. R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust
speaker recognition: A feature-based approach,” IEEE
Signal Processing Mag., vol. 13, 1996, pp. 58- 71.
45. Z. X. Yuan, B. L. Xu, and C. Z. Yu, “Binary quantization
of feature vectors for robust text-independent speaker
identification,” IEEE Tran. of Speech and Audio
Processing, vol. 7, no. 1, Jan 1990.
46. G. Velius, “Variants of cepstrum based speaker identify
verification,” in Proc. ICASSP, 1998, pp. 583-586.
47. B. H. Juang and L. Rabiner, “Fundamental of speech
recognition,” Prentice Hall, New Jersey, 1993.
48. X. Huang, A. Acero and H.-W. Hin, “Spoken language
processing,” Upper Saddle River, New Jersey, Prentice
Hall PTR, 2001.
49. F. K. Soong, A. E. Rosenberg, L. R. Rabiner and B. H.
Juang, ”A Vector Quantization Approach to Speaker
Recognition,” AT&T Technical Journal, vol.66, March-April
1987, pp.14-26.
50. A. Gersho, R. Gray, ”Vector Quantization and Signal
Compression,” Kluwer Academic Publishers, Boston, 1992.
51. Thomas Soong, Til Phan, “Text Independent Speech
Recognition,http://www.ece.Utexas.edu/~bevans/courses/
ee382c/projects/fall99/phan-soong/litsurvey.pdf, 1999.
52. Y. Linde, A. Buzo and R. Gray, “An algorithm for vector
quantizer design,” IEEE Transactions on Communications,
VOl. 28, 1980, pp. 84-95.
53. R. Schwartz, S. Roucos, and M. Berouti, “The application
of probability density estimation to text-independent
speaker identification,” ICASSP-82, 1982, pp. 1649-1652.
54. T. Kinnunen, T. Kilpeläinen, P. Fränti, “Comparison of
clustering algorithms in speaker identification,” Proc.
IASTED Int. Conf. Signal Processing and Communications
(SPC 2000), Marbella, Spain, 2000, pp. 222-227.
55. Michel Misiti, Yves Misiti, Georges Oppenheim, and Jean-
Michel Poggi, “Matlab: Wavelet Toolbox, User’s Guide,”
The Math Works, Inc, 1996-1997
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top