(3.227.208.0) 您好!臺灣時間:2021/04/18 13:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:李孝健
研究生(外文):Shiao-Chien Li
論文名稱:以特徵聲音調整為主之使用者言語資訊確認技術
論文名稱(外文):Applying Eigenvoice Model Adaptation For User Verbal Information Verification
指導教授:簡仁宗簡仁宗引用關係
指導教授(外文):Jen-Tzung Chien
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:78
中文關鍵詞:語者調適語者辨識特徵聲音使用者言語資訊確認
外文關鍵詞:speaker adaptationspeaker recognitionverbal information verificationeigenvoice
相關次數:
  • 被引用被引用:5
  • 點閱點閱:161
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:30
  • 收藏至我的研究室書目清單書目收藏:0
一般而言,特徵聲音(Eigenvoice)調適法,在少量訓練樣本情況下有相當良好的調適效果,本論文提出一套以特徵聲音為主之語者調適法並應用在使用者言語確認系統。
本方法首先將最大事後機率(MAP)取代最大相似度(ML)作為特徵向量線性組合係數之估測準則,假設每個線性組合係數都是隨機且其事前機率為一個高斯分佈,我們可以推導出一組最大事後機率估測值。它的優點是可以避免因資訊的不足,而使得估測產生過大的誤差。
本論文另一個特色是以每個狀態所對應到的語音特徵參數,利用主成分分析(PCA)投影到一個維度較小,且每維度互相獨立且正交的空間中,在此空間中計算每個狀態的共變異數矩陣,最後再還原回原來的參數空間,得到語者適應性之共變異數矩陣。
實驗結果顯示,共變異數矩陣估測方面以及最大事後機率特徵分解,都可以對特徵聲音演算法的調適效果,做進一步的提昇。我們並將改良過後的特徵聲音調適演算法,應用在使用者言語資訊確認系統中,使系統得以在新使用者第一次登入後,便調適出新語者的聲學模型,縮短語者適應性模型的建立流程,並提供更精準的確認效果。
Generally, eigenvoice based speaker adaptation is effective when the adaptation data is insufficient. This thesis presents a novel approach to eigenvoice speaker adaptation and applies the proposed approach to build a verbal information verification (VIV) system of Mandarin Chinese.
First of all, we present the maximum a posteriori eigen- decomposition (MAPED), where the linear combination coefficients for eigenvector decomposition are estimated according MAP estimation. By considering the prior density of combination coefficient as a Gaussian distribution, the MAP estimate of coefficient is obtained to carry out MAPED. In case of insufficient data, MAPED is able to achieve better estimate than maximum likelihood eigen-decomposition (MLED).
On the other hand, we use the principal component analysis (PCA) to project the speaker-specific hidden Markov model (HMM) parameters of each state onto a smaller orthogonal feature space. Then, we calculate the HMM covariance matrix using the observations in new feature space. The adaptive HMM covariance matrix is estimated by transforming matrix to the original feature space.
From the experiments, we find that the methods of adaptive HMM covariance matrix and MAPED for eigenvoice speaker adaptation can significantly improve Mandarin speech recognition. Furthermore, we apply the proposed eigenvoice adaptation method to build a verbal information verification (VIV) demonstration system. The system performance is greatly improved by incorporating the speaker enrollment phase where the HMM parameters of new speaker are well adjusted.
誌謝iv
中文摘要v
Abstractvi
章節目次vii
圖目錄ix
表目錄x
符號定義表xi
第一章 緒論1
1.1 前言1
1.2 研究動機與目的4
1.3 研究方法簡介7
1.4 章節概要8
第二章□語者辨識簡介10
2.1 前言10
2.2 語者識別與語者驗證10
2.3 文句內容相關性13
2.3.1 文句相關型系統13
2.3.2文句不相關型系統15
2.4 相關研究簡介17
2.5 高斯混合模型21
2.5.1 簡介21
2.5.2 背景模型22
2.5.3 Cohort model24
2.6 使用者言語資訊確認系統25
第三章 語者調適與特徵聲音調適34
3.1 前言34
3.2 最大相似度線性迴歸演算法35
3.3 最大事後機率調適演算法36
3.4 特徵聲音調適法37
第四章□新穎之特徵聲音調適法44
4.1 前言44
4.2共變異數估測44
4.3 最大事後機率特徵分解(Maximum a Posteriori Eigen- Decomposition)52
4.4 特徵聲音結合言語資訊確認系統55
第五章 實驗58
5.1 實驗設定58
5.2 實驗結果60
5.3 系統展示介紹62
第六章 結論與未來研究方向67
6.1 結論67
6.2 未來研究方向69
參考文獻72
[1]C.C. Broun, X. Zhang, R.M. Mersereau and M. Clements, “Automatic speechreading with application to speaker verification”, Proc. ICASSP 2002, Vol. 1, pp. I-685 -I-688
[2]W.M. Campbell and K.T Assaleh, “Polynomial classifier techniques for speaker verification”, Proc. ICASSP 1999, Vol. 1, pp. 321 -324
[3]M.J. Carey, E.S. Parris, S.J. Bennett and L.Thomas, “A comparison of model estimation techniques for speaker verification”, Proc. ICASSP 1997.Vol. 2, pp. 1083 —1086
[4]I.M. Chagnolleau, G. Durou and F. Bimbot, “Application of time-frequency principal component analysis to text-independent speaker identification”, IEEE Transactions on Speech and Audio Processing, Vol. 10 No.6, pp. 371 —378, 2002
[5]Yeou-Jiunn Chen, Chung-Hsien Wu and Gwo-Lang Yan, “Utterance Verification Using Prosodic Information for Mandarin Telephone Speech Keyword Spotting”, Proc. ICASSP 1999, Vol. 2, pp.226-230
[6]J.M. Colombi, D.W. Ruck, T.R. Anderson, S.K. Rogers and M. Oxley, “Cohort selection and word grammar effects for speaker recognition”, Proc ICASSP 2002, Vol. 1, pp. 85-88
[7]J. Eatock and J. Mason, “Automatically focusing on good discriminating speech segments in speakers recognition”, Proc. ICSLP 1990, vol. 5, pp. 133-136
[8]S. Fine, J. Navratil and R.A. Gopinath, “A hybrid GMM/SVM approach to speaker identification”, Proc. ICASSP 2001, Vol. 1, pp. 417 —420
[9]S. Furui, “Recent advances in speaker recognition”, Pattern Recognition Letters, pp. 859-872, 1997
[10]S. Furui, “Cepstral analysis technique for automatic speaker verification”, in IEEE Trans. Acoust. Speech Signal Process. 29(2), pp. 254-272, 1981
[11]S. Furui, F. Itakura and S. Saito, “Talker recognition by longtime averaged speech spectrum”, in Trans. IECE A55 1(10), pp. 549-556, 1972
[12]M.J.F. Gales, “Semi-tied covariance matrices for hidden Markov models”, IEEE Transactions on Speech and Audio Processing, Vol. 7, pp. 272 —281, May 1999
[13]C. Griffin, T. Matsui and S. Furui, “Distance measures for text-independent speaker recognition based on MAR model”, Proc. ICASSP 1994, Vol. 1, pp. I-309-312.
[14]Jialong He, Li Liu, “Speaker verification performance and the length of test sentence”, Proc. ICASSP 1999, Vol. 1 pp, 305 —308
[15]T. Isobe and J. Takahashi, “A new cohort normalization using local acoustic information for speaker verification”, Proc. ICASSP 1999, vol. 2, pp. 841 -844
[16]Hui Jiang and Li Deng, “A Bayesian approach to the verification problem: applications to speaker verification”, IEEE Transactions on Speech and Audio Processing, Vol. 9 No. 8, pp.874 -884, Nov. 2001
[17]T. Kawahara, C.H. Lee and B.H. Juand, “Combining Key-phrase detection and subword-based verification for flexible speech understanding”, Proc. ICASSP 1996, pp. 1159-1162
[18]R.Kuhn, P. Nguyen, J.C. Junqua, R. Boman, N. Niedzielski, S. Fincke, K. Field and M. Contolini, “Fast speaker adaptation using a priori knowledge”, Proc. ICASSP 1999, vol. 2 pp. 749 -752
[19]R. Kuhn, J.C. Junqua, P. Nguyen and N. Niedzielski, “Rapid speaker adaptation in eigenvoice space”, IEEE Transactions on Speech and Audio Processing, Vol. 8 No. 6, pp. 695 -707, Nov. 2000
[20]R. Kuhn, P. Nguyen, J.C. Junqua and L. Goldwasser, “Eigenfaces and eigenvoices: dimensionality reduction for specialized pattern recognition”, IEEE Second Workshop on Multimedia Signal Processing, 7-9 Dec. 1998 pp. 71 —76
[21]C.H. Lee and J.L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, Proc. ICASSP 1993, Vol. 2, 558-561.
[22]C.H. Lee, C.H. Lin and B.H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”, IEEE Trans. Acoustic Speech, Signal Process, Vol. 39, pp. 806-814, 1991.
[23]C.H. Lee and J.L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, Proc. ICASSP 1993, Vol. 2, pp. 558-561
[24]C.J. Leggetter, P.C. Woodland. “Speaker Adaptation of HMM''s Using Linear Regression”, Cambridge University, Technical Report, June 1994.
[25]C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression”, Proceedings of the Spoken Language System Technology Workshop, pp. 110-115, Jan 1995□
[26]C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language 1995, pp. 171-185
[27]K. Li and E. Wrench, “An approach to text-independent speaker recognition with short utterances”, Proc. ICASSP 1983, Vol. 1, pp. 555-558
[28]Qi Li, Biiing-Hwang Juang and Qiru Zhou, “Automatic Verbal Information Verification for User Authentication”, IEEE Transactions on Speech and Audio Processing, Vol. 8 No. 5, Sept. 2000, Page(s): 585 -596
[29]Qi Li and Biing-Hwang Juang, “Speaker verification using verbal information verification for automatic enrolment”, Proc. ICASSP 1998. Vol. 1, pp. 133 -136
[30]Xiaolong Li, K. Chen, “Mandarin verbal information verification”, Proc. ICASSP 2002, Vol. 1, pp. 833-1 -833-6
[31]Xiaohan Li, Chang and E. Bei-qian Dai, “Improving speaker verification with figure of merit training”, Proc. ICASSP 2002, Vol. 1, pp. I-693 -I-696
[32]C.B. Lima, A. Alcaim and J.A. Apolinario, “On the use of PCA in GMM and AR-vector models for text independent speaker verification”, Digital Signal Processing, 2002, Vol. 2, pp. 595 —598
[33]A. ljolje, “The importance of cepstral parameter correlations in speech recognition”, Comput. Speech Lang, vol. 8, pp. 223-232, 1994
[34]E. Lleida and R.C. Rose, “Efficient decoding and training procedures for utterance verification in continue speech recognition”, Proc. ICASSP 1996, pp. 507-510, 1996
[35]J. Markel, B. Oshika and A. Gray, “Long-term feature averaging for speaker recognition”, in IEEE Trans. Acoust. Speech Signal Process. 25(4), pp.330-337, 1977
[36]W. Mistretta and K. Farrell, “Model adaptation methods for speaker verification”, Proc. ICASSP 1998, Vol. 1 pp. 113 -116
[37]C. Montacie, et al., “Cinematic techniques for speech processing: Temporal decomposition and multivariate linear prediction”, in Proc. ICASSP 1992, pp. I-153-156,
[38]J. Naik, , M. Netsch and G. Doddington, , “Speaker verification over long distance telephone lines”, Proc. ICASSP 1989, S10b.3, pp. 524-527
[39]J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria for purpose of statistical inference”, Biometrika, pp.175-240, 1928
[40]A. Poritz, “Linear predictive hidden Markov models and the speech signal”, Proc., ICASSP 1982,Vol. 1, pp. 1291-1294
[41]M.G Rahim, Chin-Hui Lee and Biing-Hwang Juang, “Discriminative utterance verification for connected digits recognition”, IEEE Transactions on Speech and Audio Processing, Vol. 5 No. 3, May Page(s): 266 —277, 1997
[42]D.A Reynolds and R. C Rose, “Robust test-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech Audio Process. Vol. 3 pp. 72-83, 1995
[43]D.A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models”, Speech Communication, Vol. 17, pp. 91-108, 1995
[44]D.A. Reynolds, F.Q. Thomas and B.D. Robert, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing Vol. 10, pp. 19-41, 2000
[45]R.C. Rose, D. A. Reynolds, “Text-independent speaker identification using automatic acoustic segmentation”, Proc. ICASSP 1990, pp. 293-296
[46]A. Rosenberg, F. Soong, “Evaluation of a vector quantization talker recognition system in text independent and text dependent models”, Computer Speech and Language, Vol. 22, pp. 143-157, 1987
[47]A. Rosenberg, C. Lee and S. Gokcen, “Connected word talker verification using whole word hidden Markov models”, Proc. ICASSP 1991, pp. 381-384
[48]A. Rosenberg and S. Parthasarathy, “Speaker background models for connected digit password speaker verification”, Proc. ICASSP 2002, Vol. 1, pp. 81 -84
[49]B. Sabac, “Speaker recognition Using Discriminative Features Selection”, Proc. ICASSP 2001, Vol. 1, pp. I-508-I-512
[50]D.E. Sturim, D.A. Reynolds, R.B. Dunn and T.F. Quatieri, “Speaker verification using text-constrained Gaussian mixture models”, Proc. ICASSP 2002, Vol. 1, pp. I-677 -I-680
[51]R.A. Sukkar, M.B. Gandhi and A.R. Setlur, “Speaker verification using mixture decomposition discrimination”, IEEE Transactions on Speech and Audio Processing, Vol. 8 No. 3, May 2000 Page(s): 292 -299
[52]O. Thyes, R. Kuhn, P. Nguyen and J.C. Junqua, “Speaker identification and verification using eigenvoices”, Proc. ICSLP 2000, pp.205-209
[53]Man-Wai Mak and Sun-Yuan Kung, “Combining stochastic feature transformation and handset identification for telephone-based speaker verification”, Proc. ICASSP 2002, Vol. 1, pp. I-701 -I-704
[54]A. Wald, Sequential Analysis. London,U.K : Chapman&Hall, 1947
[55]N.B. Yoma and M. Villar, “Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm”, IEEE Transactions on Speech and Audio Processing, Vol. 10 No. 3, pp. 158 —166, March 2002
[56]S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK Book (Version 2.0). ECRL, 1995.
[57]W.D. Zhang, M.W. Mak and X. He, “A two-stage scoring method combining world and cohort models for speaker verification”, Proc. ICASSP 2000, Vol. 2, pp. II1193 -II1196
[58]R.D. Zilca, “Text-independent speaker verification using utterance level scoring and covariance modeling”, IEEE Transactions on Speech and Audio Processing, Vol. 10 No. 6, pp. 363 —370, Sept. 2002
[59]林秉正, “使用適應性區間模型於語者說話速度之調整”, 國立成功大學資訊工程學系碩士論文, July 2002
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔