跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.172) 您好!臺灣時間:2025/09/12 05:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:呂易宸
研究生(外文):Yi-Chen Lyu
論文名稱:語音門禁系統
論文名稱(外文):Speech Access System based on Speaker Identification
指導教授:莊堯棠
指導教授(外文):Y.-T. Juang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:98
中文關鍵詞:關鍵字擷取高斯混合模型最大事後機率
外文關鍵詞:Maximum a posteriorGaussian Mixture Modelkeywords spotting
相關次數:
  • 被引用被引用:9
  • 點閱點閱:304
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文主要是設計一套可用於門禁之語音辨識系統,利用語者辨識技術,判斷輸入聲音是否為核可的使用者之聲音,並結合關鍵詞萃取技術,使系統可辨識出使用者及姓名,且再配合語音合成技術,讓系統不單是純文字的回應,而是模擬人聲之回應,之後經過程式語言包裝,建立一個人機介面的系統,方便使用者操作使用。
因為是門禁系統,需要達到即時或是線上的要求,因此使用到的方法所花費之時間必須考慮,無法將許多方法通通加入,沒辦法讓使用者等待太久才得知結果,所以在方法必須有所篩選,這當然對辨識率有一定程度的影響,但也只能以時間為先決條件,去選擇合適的演算法。在語者辨識部份,經過自行錄製的實驗測試,直接使用使用者的聲音各自建立專屬模型,效果會比經貝氏調適法調適後的模型好。而在關鍵詞部份,因為系統有可新增使用者之功能,所以不可能事先知道使用者姓名,然後針對使用者姓名做模型訓練,改成使用次音節模型,再串成對應的模型,省去各別訓練的時間提高實用性。
從自行測試的實驗結果得知,系統核可使用者人數 38 人,全部測試人數 40 人,有兩個人是模擬仿冒者情況進行測試,語者辨識率 94.9% ,錯誤接受率 0.8% ,關鍵詞辨識率 90.6% ,而平均辨識一句都各自約為 0.5 秒,辨識已可達即時之要求。
The purpose of this thesis is to design a speech access system with speaker recognition technology which can determine whether the input sound of the user voice is valid or not. Combined with keywords spotting technology, the system can identify the name of users. And coupled with text-to-speech technology, the system uses not only a text but also human voice response. System built by Microsoft Foundation Classes (MFC) windows based interface is facilitated for the user to operate.
Because access control system needs to meet the requirements of real-time or online, as the result, the consumed time of used methods must take into account because users would not spend much time waiting for results. Therefore, methods must be selective since they affect the recognition rate and time seems to be regarded as the prerequisite element while selecting the appropriate algorithm.
There are 40 participants join this test, and there are 38 target users among them, while the other two are imposers. Speaker recognition rate is 94.9%, the false acceptance rate is 0.8%, and the keyword recognition rate is 90.6%. The average recognition sentences are about 0.5 seconds each. Identification has been up to the real-time requirements.
摘要......................... i
Abstract....................... ii
目錄......................... iii
附圖目錄....................... vi
附表目錄....................... viii
第一章 緒論
1.1 研究動機..................... 1
1.2 研究目標..................... 2
1.3 語音辨識簡介................... 2
1.3.1 動態時間軸校準................. 3
1.3.2 類神經網路................... 4
1.3.3 隱藏式馬可夫模型................ 5
1.4 門禁系統簡介................... 6
1.5 語者辨識概述................... 7
1.6 章節概要..................... 10
第二章 語音處理與關鍵詞萃取
2.1 特徵參數擷取................... 11
iii
2.2 隱藏式馬可夫模型................. 16
2.3 聲學模型及訓練.................. 18
2.4 關鍵詞萃取.................... 23
2.4.1 關鍵詞萃取架構................. 24
2.4.2 一階動態演算法................. 27
2.4.3 關鍵詞辨識流程................. 29
2.5 關鍵詞確認.................... 30
2.5.1 關鍵詞確認流程................. 31
2.5.2 次音節的假設測試................ 33
2.5.3 關鍵詞確認的信任測度.............. 34
第三章 語者辨識與確認
3.1 語音模型建立................... 36
3.1.1 高斯混合模型.................. 36
3.1.2 向量量化.................... 38
3.1.3 期望值最大化演算法............... 40
3.2 語者模型調適................... 42
3.2.1 通用背景模型.................. 43
3.2.2 貝式調式法................... 44
3.3 語者識別..................... 47
iv
3.4 語者確認..................... 48
第四章 語音門禁系統架構及結果
4.1 實驗環境..................... 51
4.2 系統架構..................... 52
4.3 系統流程..................... 56
4.4 系統實驗..................... 59
4.4.1 語者辨識實驗.................. 59
4.4.2 關鍵詞萃取實驗................. 65
4.5 相關文獻..................... 68
第五章 結論與未來展望
5.1 結論....................... 70
5.2 未來展望..................... 71
參考文獻....................... 72
附錄......................... 82
[1] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[2] G. R. Doddington, “Speaker recognition—Identifying people by their voices,” Processing of the IEEE, vol. 73, pp. 1651-1664, 1985.
[3] D. O''Shaughnessy, “Speaker recognition,” ASSP Magazine , IEEE, vol. 3, pp. 4-17, 1986.
[4] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison Wesley, 1974.
[5] L. S. Lee and Y. Lee, “Voice Access of Global Information for Broad-Band Wireless: Technologies of Today and Challenges of Tomorrow,” Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57, January 2001.
[6] L. Zao, A. Alcaim and R. Coelho, “Robust Access based on Speaker Identification for Optical Communications Security,” Digital Signal Processing, 2009 16th International Conferenxe on, pp. 1-5, 2009.
[7] Wahyudi, W. Astuti and S. Mohamed, “A Comparison of Gaussian Mixture and Artificaial Neural Network Models for Voiced-based Access Control System of Building Security,” Information Technology, 2008. ITSim 2008. International Symposium on, vol. 3, pp. 1-8, 2008.
[8] 蔡仲齡,“含語者驗證之小型場所人臉辨識門禁系統的研發",國立成功大學碩士論文,中華民國九十七年七月。
[9] X. D. Huang and K. F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 1, pp 150-157, 1993.
[10] H. Sakoe and S. Chiba, “Dynamic Programming Algorithm Optimization for Spoken Word Recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 26, pp. 43-49, 1978.
[11] C. Myers, L. Rabiner and A. Rosenberg, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 28, pp. 623-635, 1980.
[12] D. P. Morgan and C. L. Scofield, Neural Network and Speech Processing, Kluwer Academic, 1991.
[13] P. Pujol, S. Pol, C. Nadeu and A. Hagen, “Comparison and Combination of Features in a Hybrid HMM/MLP and a
HMM/GMM Speech Recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 13, pp. 14-22, 2005.
[14] W. Dong-Liang, W.W.Y. Ng, P.P.K. Chan and D. Hai-Lan, “Access control by RFID and face recognition based on neural network,” ICMLC, 2010 International on, vol. 2, pp. 675-680, 2010.
[15] S. Jieun and K. Howon, “The RFID Middleware System Supporting Context-Aware Access Control Service,” ICACT 2006, vol. 1, pp. 863-866, 2006.
[16] Y. Gizatdinova and V. Surakka, “Feature-Based Detection of Facial Landmarks from Neutral and Expressive Facial Images,” Pattern Analysis and Machine Intelliqence, IEEE Transactions on, vol. 28, pp. 135-139, 2006.
[17] P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” Pattern Analysis and Machine Intelliqence, IEEE Transactions on, vol. 19, pp. 711-720, 1997.
[18] M. J. Er, S. Wu, J. Lu and H. L. Toh, “Face Recognition With Radial Basis Function (RBF) Neural Network,” Neural Networks, IEEE Transactions on, vol. 13, no. 3, pp. 697-710, 2002.
[19] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” Speech and Audio Processing, IEEE Transactions on, vol. 3, pp. 72-83, 1995.
[20] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[21] 王小川,“語音訊號處理",全華,民國九十三年三月。
[22] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[23] John R. Deller, Jr. , John G Proakis and John H. L. Hansen, Discrete-Time Processing of Speech Signals, 1987.
[24] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, Feb. 1989.
[25] S. E. Levinson, L. R. Rabiner and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition,” The Bell System Technical Journal, vol. 62, no. 4, April 1983.
[26] Changsheng Ai, Xuan Sun, Honghua Zhao and Xueren Dong, “Pipeline damage and leak sound recognition based on HMM,” Proceedings of the 7th World Congress on Intelligent Control and Automation, pp. 1940-1944, June. 2008.
[27] 蔡永琪,“基於次音節單元之關鍵詞辨識",國立中央大學碩士論文,中華民國八十四年六月。
[28] M.-W. koo, C.-H. Lee and B.-H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Trans. on Speech and Audio Processing, vol. 9, no. 8, pp. 821-832, Nov. 2001.
[29] J. Zhi-Hua and Y. Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” ISPACS 2007, pp. 32-35, Nov. 2007.
[30] 黃國彰,“關鍵詞萃取與確認之研究",國立中央大學碩士論文,中華民國八十五年六月。
[31] 王維邦,“連續國語語音關鍵詞萃取系統之研究與發展",國立中央大學碩士論文,中華民國八十六年六月。
[32] H. Bourlard, B. D’hoore and J. M. Boite, “Optimizing recognition and rejection performance in wordspotting systems,” ICASSP-94, vol. 1, pp. I/373-I/376, 1994.
[33] H. Ney, “The use of a one stage dynamic programming algorithm for connected word recognition,” IEEE Trans. on Acoustic, Speech Signal, Processing, vol. 32, no. 2, pp. 263-271, April 1984.
[34] W. Jhing-Fa, W. Chung-Hsien, H. Chaug-Ching and L. Jau-Yien, “Integrating Neural Nets and One-Stage Dynamic Programming for Speaker Independent Continuous Mandarin Digit Recognition,” Acoustics, Speech, and Signal Processing, 1991, vol. 1, pp. 69-72, Apr 1991.
[35] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statistical hypotheses,” phil. Trans. R. Soc. Lond. A, vol. 231, pp. 289-337, 1933.
[36] J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria for purpose of statistical inference,” Biometrika, pt I, vol. 20A, pp. 175-240, 1928.
[37] T. Kawahara, C.-H. Lee and B.-H. Juang, “Flexible Speech Understanding Based on Combined Key-Phrase Detection and Verification,” IEEE Trans. on Speech and Audio Processing, vol. 6, no. 6, pp. 558-568, Nov. 1998.
[38] Tatsuya Kawahara, C.-H. Lee and B.-H. Juang, “Combining Key-Phrase Detection and Subword-Based Verification For Flexible Speech Understanding,” Proc IEEE Int. Conf. Acoustic, Speech, Signal Processing, vol. 2, pp. 1159-1162, Munich Germany, May. 1997.
[39] T. Chee-Ming, S.-H. Salleh, T. Tian-Swee and A. K. Ariff, “Text Independent Speaker Identification Using Gaussian Mixture Model,” ICIAS 2007, pp. 194-198, Nov. 2007.
[40] 黃夢晨,“最小錯誤鑑別式應用於語者辨識之競爭語者探討",國立中央大學碩士論文,中華民國九十七年六月。
[41] F. Soong, A. Rosenberg, L. Rabiner and B. Juang, “A vector quantization approach to speaker recognition,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ‘85, vol. 10, pp. 387-390, 1985.
[42] Y. Linde, A. Buzo and R. Gray, “An Algorithm for Vector Quantizer Design,” Communications, IEEE Transactions on, vol. 28, no. 1, pp. 84-95, 1980.
[43] T. K. Moon, “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November 1996.
[44] S. Z. Selim and M. A. Ismail, “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 81-87, Jan. 1984.
[45] A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1-38, 1977.
[46] D. A. Reynolds, “Comparison of background normalization methods for text-independent speaker verification,” EUROSPEECH ‘97, 5th European Conference on Speech Communication and Technology, pp. 963-966, 1997.
[47] J.-L. Gauvain and L. Chin-Hui, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” Speech and Audio Processing, IEEE Transactions on, vol. 2, pp. 291-298, 1994.
[48] C. J. Leggetter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, ” Computer Speech & Language, vol. 9, pp. 171-185, 1995.
[49] A. Sankar and L. Chin-Hui, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition, ” Speech and Audio Processing, IEEE Transactions on, vol. 4, pp. 190-202, May 1996.
[50] O. Siohan, C. Chesta and Lee Chin-Hui, “Joint maximum a posteriori adaptation of transformation and HMM parameters,” Speech and Audio Processing, IEEE Transactions on, vol. 9, pp. 417-428, 2001.
[51] D. A. Reynolds, T. F. Quatieri and R. B. Dunn, “Speaker verification using Adapted Gaussian mixture models, ” Digital Signal Processing, vol. 10, pp. 19-41, 2000.
[52] 范世明,“高斯混合模型在語者辨識與國語語音辨認之應用",國立交通大學碩士論文,中華民國九十一年。
[53] 位元文化,“精通MFC視窗程式設計-Visual Studio 2008版",文魁資訊,民國九十七年。
[54] R. F. Raposa,“C++與MFC視窗程式設計",陳智湧、歐世亮和林志偉譯,文魁資訊,民國九十七年。
[55] 溫家誠,“多媒體應用之語音辨識系統",國立中央大學碩士
文,中華民國九十七年六月。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊