跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.90) 您好!臺灣時間:2025/01/21 18:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳思翰
研究生(外文):Shi-Han Chen
論文名稱:編碼語音下的說話人辨認
論文名稱(外文):Speaker Recognition Using Coded Speech
指導教授:王小川王小川引用關係
指導教授(外文):Hsiao-Chuan Wang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:76
中文關鍵詞:說話人辨認語者辨認編碼殘留信號韻律參數
外文關鍵詞:speaker recognitioncodedresidueresidualprosodicfeature
相關次數:
  • 被引用被引用:0
  • 點閱點閱:125
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
由於近年來網際網路以及行動電話的快速發展,人與人間的語音傳遞已經慢慢的開始數位化。舉例來說,目前普及的GSM行動電話系統,以及已逐漸開始使用的網路電話中所傳遞的語音都是經過語音編碼器數位化過的。同時,越來越多互動式的網路應用也出現了。人們現在可以使用行動電話來轉帳,也可以透過電話來訂火車票。在這些需要高度安全性的應用中,如何確認一個使用者的真實身份就變的非常重要了。由於使用者是透過電話介面操作系統,因此利用人的聲音特性來進行身份確認不失為一個方便的方式。可惜的是,前人的研究中已經發現語音編碼器會造成聲音的失真,進一步造成辨識率下降。本論文從語音特徵參數的角度著手,針對G.729A 以及 GSM 這兩種語音編碼器所產生的語音信號進行處理。首先我們以殘留信號來提升辨識率,接下來加入了韻律參數來降低語音編碼器對特徵參數造成的影響,實驗結果證明殘留信號對說話人辨認有很大的幫助,而韻律參數也可以進一步的提升正確率,尤其是在編碼不匹配(mismatched)的情況下,這是因為韻律參數的強健性(robustness)可使正確率大幅提昇。最後我們提出了一個對韻律參數更合適的數學模型來提高辨識率。與前人的方法相比較,這個改進的模型在語者辨認及語者確認中都可以有較好的表現。

Performance of speaker recognition degrades while speech is coded by a speech coder. In this thesis, we evaluate the performance of G.729A and GSM coded speech, and the residue signal and prosodic features are used to improve the performance. The residue signal is shown to be very useful for speaker recognition, and the prosodic features are shown to be very robust to codec distortion. Moreover, an improved prosodic model is proposed in this thesis and the performance is better than that of the previously proposed model in both speaker identification and speaker verification.

ABSTRACT
LIST OF FIGURES
LIST OF TABLES
CHAPTER 1 INTRODUCTION
CHAPTER 2 INTRODUCTION TO SPEECH CODECS
2.1 Full rate GSM speech coder
2.2 G.729 speech coder
2.3 G.729A speech coder
CHAPTER 3 INTRODUCTION TO SPEAKER RECOGNITION
3.1 Model description
3.2 Speaker identification
3.3 Speaker verification
3.4 Speech analysis
3.5 Channel compensation
3.5.1 Channel Mean Normalization
3.5.2 Delta Cepstrum
CHAPTER 4 SPEAKER RECOGNITION USING CODED SPEECH
4.1 Usefulness of the residue
4.2 Codec distortion in acoustic features and prosodic features
4.3 Extraction of the prosodic features
4.4 Modeling of the prosodic features
CHAPTER 5 EXPERIMENTS AND RESULTS
5.1 Speaker identification
5.1.1 Database description
5.1.2 Baseline experiments
5.1.3 Recognition with codec parameters: without the use of the residue
5.1.4 Recognition with codec parameters: with the use of the residue
5.1.5 Recognition using decoded speech: with the use of the residue
5.1.6 Recognition with the use of prosodic features
5.2 Speaker verification
5.2.1 Database description
5.2.2 Baseline experiments
5.2.3 Recognition using prosodic features
5.2.4 Recognition using combined features
CHAPTER 6 CONCLUSION
REFERENCES

[1] Jari Turunen, Damjan Vlaj: "A Study of Speech Coding Parameters in Speech Recognition", Proc. EUROSPEECH 2001, pp. 2363-2366, 2001
[2] An-Tzyh Yu, Hsiao-Chuan Wang, “A Study on the Recognition of Low Bit-Rate Encoded Speech”, Proc. ICSLP 1998, pp. 38-41, 1998
[3] Euler, S. and Zinke, J. “The Influence of Speech Coding Algorithms on Automatic Speech Recognition”. ICASSP-94, Vol. 1, pp. 621-624. 1994.
[4] Lilly, B. T. and Paliwal, K. K. "Effect of Speech Coders on Speech Recognition Performance". ICSLP-96, Vol 4, pp. 2344-2347. 1996.
[5] J.M. Huerta and R.M. Stern, “Speech Recognition from GSM Coder Parameters", Proc. ICSLP-98, Vol 4, pp. 1463-1466, 1998
[6] Kim, H.K., and Cox, R. (2000), “Bitstream-based feature extraction for wireless speech recognition”, Proc. ICASSP 2000, Vol 3, pp. 1607 -1610, 2000
[7] Raj, B.; Migdal, J.; Singh, R., "Distributed Speech Recognition with Codec Parameters", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December 2001 (ASRU 2001)
[8] Gallardo-Antolin, A., Diaz-de-Maria, F., and Valverde- Albacete, F., “Recognition from GSM Digital Speech”, Proc. ICSLP 1998, pp. 584-587, 1998
[9] M. Naito, S. Kuroiwa, T. Kato, T. Shimizu and N. Higuchi : "Rapid CODEC Adaptation for Cellular Phone Speech Recognition," Proc. of EUROSPEECH 2001, Vol. II, pp. 1099-1102, 2001
[10] M.G. Kuitert & L. Boves, “Speaker verification with GSM coded telephone speech”, Proc. EUROSPEECH 1997, Rhodes, Vol.2, pp. 975-978, 1997
[11] T.F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, “Speaker and Language Recognition Using Speech Codec Parameters”, Proc. EUROSPEECH 1999, Vol.2, pp. 787-790, 1999
[12] Besacier, L., Grassi, S., Dufaux, A., Ansorge, M., Pellandini, F.,” GSM speech coding and speaker recognition”, ICASSP-00, Vol. 2, pp. 1085-1088, 2000.
[13] Quatieri T.F., Dunn R.B., Reynolds D.A., Campbell J.P., Singer E., “Speaker Recognition using G.729 speech codec parameters”, Proc. ICASSP '00, Vol. 2, pp. 1089-1092, 2000
[14] C. Mokbel, L. Mauuary, L. Karray, D. Jouvet, J. Monne, J. Simonin,K. Bartkova, "Toward improving ASR robustness for PSN and GSM telephone applications," Speech Communication, vol. 23, no. 1, pp.141?59, Oct. 1997.
[15] ETSI standard document, “Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm”, ETSI ES 201 108 v1.1.2 (2000-04), April 2000
[16] P. Thevenaz & H. Hugli, "Usefulness of the LPC-Residue in text-independent speaker verification". Speech Communication, Vol. 17, pp. 145-157. 1995.
[17] J. He, L. Liu, and G. Palm, "On the use of residual cepstrum in speech recognition," Proc. IEEE of ICASSP'96, Vol. 1, pp. 5-8, May, 1996, Atlanta,USA.
[18] J. He, L. Liu, and G. Palm, "On the use of features from prediction residual signals in speaker identification," Proc. of EUROSPEECH'95, Vol. 1, pp. 313-316, Sept. 1995, Madrid, Spain
[19] http://kbs.cs.tu-berlin.de/~jutta/toast.html
[20] ETSI standard document, “European digital telecommunications system (Phase 2+), full rate speech transcoding (GSM 06.10 version 8.1.1 Release 1999), http://www.etsi.org
[21] ITU-T Recommendation, G.729, “Coding of speech at 8 kbit/s using conjugate structure algebraic-code-excited linear-prediction (CS-ACELP)”
[22] ITU-T Recommendation, G.729 Annex A, “Coding of speech at 8 kbit/s using conjugate structure algebraic-code-excited linear-prediction (CS-ACELP), Annex A: Reduced complexity 8 kbit/s CS-ACELP speech codec”
[23] Douglas O’Shaughnessy, Speech Communications: Human and Machine, 2nd ed., IEEE Press, 2000
[24] Thomas F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice Hall, 2002
[25] D. A. Reynolds and R. C. Rose. “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Trans. on Speech and Audio Processing, 3(1):72 - 83, 1995.
[26] Roland Auckenthaler , Eluned S Parris , Michael J Carey, “Improving a GMM Speaker Verification System by Phonetic Weighting”, ICASSP 1999, pp. 313-316, 1999
[27] Douglas A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication, 17, pp. 91-108, 1995
[28] Rosenberg, A. E., DeLong, J., Lee, C. H., Juang, B. H., and Soong, F. K., “The use of cohort normalized scores for speaker verification”.ICSLP-92, November 1992, pp. 599—602.
[29] Rosenberg, A. E. and Parthasarathy, S., “Speaker background models for connected digit password speaker verification”, ICASSP 96, May 1996, pp. 81—84.
[30] Deller J., Proakis J., Hansen J., Discrete-Time Processing of Speech Signals, McMillan Publishing Company, New York, 1993.
[31] Rabiner, L. and Juang, B.-H., Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs 1993.
[32] S. B. Davis, P. Mermelstein, “Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoust., Speech, Signal Processing, vol ASSP-28, pp. 357-366, Aug. 1980
[33] F. K. Soong und A. E. Rosenberg, “On the use of instantaneous and transitional spectral information in speaker recognition”, IEEE Trans. Acoustics, Speech and Signal Proc., vol. 1, ASSP-36, no. 6, pp. 871-879, 1988
[34] K. Sonmez, E. Shriberg, L. Heck & M. Weintraub, “Modeling Dynamic Prosodic Variation for Speaker Verification”, ICSLP-98, vol. 7, pp. 3189-3192, Sydney
[35] Fukunaga, Keinosuke, Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, 1990
[36] http://www.nist.gov/speech/tests/spk/1999/spkrec99.html
[37] D. A. Reynolds, "Comparison of Background Normalization Methods for Text-independent Speaker Verification”, Proc. EUROSPEECH 1997, pp 963-966.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top