(3.80.6.131) 您好!臺灣時間:2021/05/17 03:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:許晏銘
研究生(外文):Yan-Ming Hsu
論文名稱:基於動態規劃之機器學習方法於小字彙DTW語音辨識系統之研究
論文名稱(外文):Machine Learning Based on Dynamic Programming for Small-Sized Vocabulary DTW Speech Recognition
指導教授:丁英智
指導教授(外文):Ing-Jhih Ding
學位類別:碩士
校院名稱:國立虎尾科技大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:70
中文關鍵詞:語音辨識動態時間校正機器學習改良式ViterbiSHMM
外文關鍵詞:speech recognitiondynamic time warpingmachine learningiViterbiSHMM
相關次數:
  • 被引用被引用:7
  • 點閱點閱:758
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出了一種以動態規劃為基礎的機器學習方法於小字彙DTW語音辨識系統上,對於早期的語音辨識技術核心,是以動態規劃(dynamic programming, DP)之原理所延伸出來的動態時間校正(dynamic time warping, DTW)為主,但由於傳統的DTW語音辨識系統在辨識比對的過程中,是屬於一種樣本匹配(template matching)的比對方式,其會因為參考樣本的樣本數量而影響整個辨識系統的辨識比對速度與辨識效果,並且傳統DTW對於小字彙獨立詞的辨識當中,若是獨立詞的字數過多時,也同樣會影響整個辨識系統的辨識準確率,因此為了改善傳統DTW語音辨識系統的部分隱憂,本文首先提出三種機器學習方法於DTW語音辨識,分別是累進式學習與優先權剃除學習方法等兩種監督式學習方法,以及最多數匹配學習之非監督式學習方法。在實驗部分也證明了這樣的機器學習方法研究的確能夠有效提升傳統DTW在語音辨識上的辨識準確率。
傳統DTW所採用之樣本匹配比對方式為一種非模型化(modeling)的方法,其在辨識比對及語音辨識系統的學習上仍將有諸多弱點極需克服,為了解決此問題而能有效強化傳統DTW語音辨識技術,本論文延續前述研究,接續提出一種仿隱藏式馬可夫模型(hidden Markov model, HMM)方法於小字彙DTW語音辨識之研究。隱藏式馬可夫模型是一種具狀態轉移觀念與統計理論的機率模型,本論文藉由這類具備模型化概念之HMM技術,將其設計為一套簡易版本(亦即仿隱藏式馬可夫模型)而植入至傳統DTW辨識技術中。在DTW技術中所發展之簡易化HMM方法稱為SHMM(亦即Simplified HMM),在SHMM的設計架構下,傳統HMM辨識時所慣用之Viterbi演算法將能同時融合DTW動態規劃技術而成為一種改良式Viterbi演算法,此一改良式Viterbi演算法將能有效提昇樣本辨識的性能。所發展之系統在進行辨識決策時,分別先取得改良式Viterbi演算法之計算結果與DTW動態規劃比對方法之運算結果,而再藉由設計一個具模糊邏輯推論的模糊控制器將此兩項演算結果之值進行決策融合而最後得到一個精準之辨識輸出結果值,此方法稱為FuzzySHMMDTW。實驗結果顯示,在對於小字彙獨立詞的辨認情況當中,FuzzySHMMDTW之辨識準確率是比傳統DTW語音辨識具有更高的準確性。
  針對小字彙DTW語音辨識所設計出的SHMM建模方法,使得辨識系統當中已具有狀態統計觀念的模型,其DTW辨識時之比對運算不再只是單純的樣本匹配方式。為了使這類所發展之具模型化的DTW辨識技術能夠依照不同語者的發音性質而進一步進行學習,進而使系統的平均辨識率能維持一定的水準,本論文設計了兩種模型學習方法,一種為基於學習語料之數量做為主要考量的方法,而另一種則是以學習語料之品質作為主要訴求的方式,實驗結果顯示了對於辨識效果較差之語者,在經過系統模型的學習後,能夠使其辨識準確率得到有效地提升,進而達到一定水平。


This thesis presents a new framework of machine learning based on dynamic programming for small-sized vocabulary DTW speech recognition. Two categories of learning strategies for DTW are developed first, which are supervised learning and unsupervised learning. Supervised learning contains incremental learning and priority rejection learning methods. For unsupervised learning, an approach called most matching learning is developed. All these three machine learning methods are effective for DTW on recognition performance improvements, which can be proved by experiments.
In addition, we further present a hidden Markov model (HMM)-like approach for DTW speech recognition, which is called as simplified HMM (SHMM). SHMM is a simple-versioned HMM modeling technique for conventional DTW. Under the framework of SHMM, an improved Viterbi algorithm, called iViterbi, is proposed. iViterbi combines the dynamical programming of DTW and optimal calculations of conventional Viterbi for pattern recognition. At last, we design a fuzzy controller for the recognition system when making a decision of recognition results. The fuzzy scheme will carry out model fusion that combines DTW and iViterbi recognition calculation outcomes efficiently. The overall recognition system with the support of fuzzy control is therefore called FuzzySHMMDTW. Experimental results on small-sized vocabulary speech recognition show that the recognition rate of proposed FuzzySHMMDTW is better than that of traditional DTW.
In order to maintain the recognition performance of FuzzySHMMDTW on a standard level even when the system encounters a strange speaker, we proposed two modeling-based machine learning methods for FuzzySHMMDTW. Experimental results demonstrate the effectiveness of these two learning methods. By learning, the recognition performance of the system will be improved continually.


摘要 i
Abstract ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1研究動機 1
1.2語音辨識概述 1
1.3研究方向 2
1.4章節概要 3
第二章 語音辨識系統及基礎技術 4
2.1 語音訊號之特徵參數擷取 4
2.2 動態時間校正(dynamic time warping, DTW) 5
2.2.1 附近路徑限制 6
2.2.2 DTW完整比對 7
2.3 隱藏式馬可夫模型(hidden Markov model, HMM) 8
2.3.1 Viterbi演算法 10
2.4 聲學模型(acoustic model) 12
第三章 機器學習於DTW語音辨識 13
3.1 非模型化之DTW機器學習 14
3.1.1 累進式學習(incremental learning, IL) 14
3.1.2 優先權剔除學習(priority rejection learning, PRL) 16
3.1.3 最多數匹配學習(most matching learning, MML) 18
3.2 簡易化隱藏式馬可夫模型(simplified HMM, SHMM)於DTW語音辨識 21
3.2.1 SHMM之模型初始化 21
3.2.2 SHMM之遞迴訓練與最佳模型建立 22
3.2.3 SHMM於關鍵詞語音辨識 24
3.3 結合Viterbi與DTW於SHMM之模型辨識比對 26
3.3.1 運用DTW改良Viterbi方法(improved Viterbi, iViterbi) 26
3.3.2 SHMM之辨識比對 30
3.4 模糊機制於SHMM與DTW之融合決策(FuzzySHMMDTW) 31
3.4.1 相像度分數(Similarity Score) 32
3.4.2 FuzzySHMMDTW之模糊機制設計 32
3.5 SHMM模型化之DTW機器學習 38
3.6 模型化與非模型化DTW語音辨識之探討 40
第四章 實驗結果與分析比較 41
4.1 實驗環境設定與語音資料庫建置 41
4.2 非模型化之DTW機器學習實驗 42
4.2.1 累進式學習之實驗 44
4.2.2 優先權剔除學習之實驗 45
4.2.3 最多數匹配學習之實驗 46
4.3 SHMM模型化之DTW實驗 47
4.3.1 SHMM模型狀態數設定實驗 48
4.3.2 FuzzySHMMDTW之實驗 49
4.4 SHMM模型化之DTW機器學習實驗 52
第五章 結論 57
參考文獻 58
Extended Abstract 63
簡歷(CV) 70

1.P. Woodland, “Speech recognition,” IEE Colloquium on Speech and. Language Engineering, pp. 1–5, 1998.
2.S. K. Gaikwad, B. W. Gawali and P. Yannawar, “A review on speech recognition technique,” International Journal of Computer Applications, vol. 10, no.3, pp. 16 24, 2010.
3.王小川, 語音訊號處理(修訂二版), 全華圖書股份有限公司, 台北, 2009.
4.T. B. Amin and I. Mahmood, “Speech recognition using dynamic time warping,” Proc. International Conference on Advances in space Technologies (ICAST), Islamabad, Pakistan, November 2008.
5.L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov models,” IEEE ASSP Magazine, vol. 3, no. 1, pp.4-16, Jan 1986
6.L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE , vol.77, no.2, pp.257-286, Feb 1989
7.L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
8.S. H. Chen and Y. R. Wang, “Tone recognition of continuous Mandarin speech based on neural networks,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 2, pp. 146-150, 1995.
9.X. Tang, “Hybrid hidden Markov model and artificial neural network for automatic speech recognition,” Proc. Pacific-Asia Conference on Circuits, Communications and Systems (PACCS ''09ANN), Chengdu, China, pp. 682-685, May 2009.
10.A. K. Paul, D. Das and M. M. Kamal, “Bangla speech recognition system using LPC and ANN,” Proceedings of IEEE International Conference on Advances in Pattern Recognition, 2009.
11.H. -Y. Gu and C. –Y. Wu, “Model spectrum-progression with DTW and ANN for speech synthesis,” Proc. International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Pattaya, Thailand, pp. 1010-1013, May 2009.
12.M. De Wachter, M. Matton, K. Demuynck and P. Wambacq, “Template-based continuous speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1377-1390, May 2007.
13.I. D. Shallom, R. Haimi-Cohen and Z. M. Rannon, “Dynamic time warping with generalized templates for speaker independent speech recognition,” Proc. IEEE Conference, Electrical and Electronics Engineers in Israel, pp. 1-4, March 1989
14.J. Zhang, “Research of improved DTW algorithm in embedded speech recognition system,” Proc. International Conference on Intelligent Control and Information Processing (ICICIP), Dalian, China, pp. 73-75, August 2008.
15.C. Wan and L. Liu, “Research and improvement on embedded system application of DTW-based speech recognition,” Proc. International Conference on Anti-counterfeiting, Security and Identification (ASID 2008), Guiyang, China, pp. 401-404, August 2008.
16.趙俊超, “改良式DTW語音辨識系統之FPGA實現與分析”, 國立成功大學工程科學研究所碩士論文, 2006.
17.R. Vergin, D. O''Shaughnessy and A. Farhat, “Generalized Mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, 1999.
18.X. Zhu, Y. Chen, J. Liu and R. Liu, “Feature selection in Mandarin large vocabulary continuous speech recognition,” Proc. International Conference on Signal Processing, vol. 1, pp. 508-511, August 2002.
19.X. Zhang, Y. Guo and X. Hou, “A speech recognition method of isolated words based on modified LPC cepstrum,” Proc. IEEE International Conference on Granular Computing (GRC 2007), Fremont, CA., pp. 481-484, November 2007.
20.S. C. Sajjan and C. Vijaya, “Comparison of DTW and HMM for isolated word recognition,” Proc. International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME), Salem, Tamilnadu, pp. 466-470, March 2012.
21.林子正, “基於多模型架構之語者辨認系統”, 國立虎尾科技大學電機工程系碩士班碩士論文, 2012.
22.H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 26, no. 1, pp. 43 49, 1978.
23.C. Kim and K. –D. Seo, “Robust DTW-based recognition algorithm for hand-held consumer devices [speech recognition],” International Conference on Consumer Electronics (ICCE 2005), pp. 433 434, 2005.
24.W. H. Abdulla, D. Chow and G. Sin, “Cross-words reference template for DTW-based speech recognition systems,” Conference on Convergent Technologies for the Asia-Pacific Region (TENCON 2003), vol. 4, pp. 1576-1579, October 2003.
25.X. Anguera, R. Macrae and N. Oliver, “Partial sequence matching using an unbounded dynamic time warping algorithm,” Proc. IEEE International Conference on Acoustics Speech and Signal Processing, 2010.
26.Y. –S. Lin and C. –P. Ji, “Research on improved algorithm of DTW in speech recognition,” Proc. International Conference on Computer Application and System Modeling (ICCASM), 2010.
27.J. Zhang and M. Zhang, “Speech recognition system based improved DTW algorithm,” Proc. International Conference on Computer, Mechatronics, Control and Electronic (CMCE), Changchun, China, pp. 320-323, August 2010.
28.D. Zhou and J. Zhang, “The Improvement of DTW Algorithm in Speech Recognition,” Proc. International Conference on Internet Technology and Applications, Wuhan, China, pp. 1-4, August 2010.
29.T. Zaharia, S. Segarceanu, M. Cotescu and A. Spataru, “Quantized dynamic time warping (DTW) algorithm,” Proc. IEEE International Conference on Communications (COMM), Bucharest, Romanian, pp. 91-94, June 2010.
30.Y. Li, J. Le, Y. Yang and J. Wang, “Improvement algorithm of DTW on isolated-word recognition,” Proc. IEEE International Conference on Computer Science and Automation Engineering (CSAE), Shanghai, China, pp. 319-322, June 2011.
31.J. Zhang and B. Qin, “DTW speech recognition algorithm of optimization template matching,” Proc. World Automation Congress (WAC), Puerto Vallarta, Mexico, pp. 1-4, June 2012.
32.X. –H. Hu, G. Zhao, L. –J. Zhan, Y. Xue, W. Zhou, “Isolated word speech recognition based on HRSF and improved DTW algorithm,” Proc. IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Macau, vol. 3, pp. 270-273, December 2012.
33.Q. Chen, G. Hu, F. Gu and P. Xiang, “Learning optimal warping window size of DTW for time series classification,” Proc. International Conference on Information Sciences, Signal Processing and their Applications, 2012.
34.P. G. N. Priyadarshani, N. G. J. Dias and A. Punchihewa, “Dynamic time warping based speech recognition for isolated Sinhala words,” Proc. IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Boise, I. D., pp. 892-895, August 2012.
35.張恆誌, “使用動態時間校正演算法於國語數字語者辨識系統之研究”, 義守大學電子工程學系碩士班碩士論文, 2011.
36.S. W. Foo and T. Yap, “HMM speech recognition with reduced training,” Proc. International Conference on Information, Communications and Signal Processing (ICICS), Singapore, vol. 2, pp. 1016-1019, September 1997.
37.S. Ke, Y. Hou, Z. Huang and H. Li, “A HMM speech recognition system based on FPGA,” Proc. Congress on Image and Signal Processing (ICICS), Sanya, China, vol. 5, pp. 305-309, May 2008.
38.楊鎮光, “快速演算法在大字彙關鍵詞萃取上的應用”, 國立中央大學電機工程研究所碩士論文, 2001.
39.陳啟鏘, “實現於可重組式單晶片系統之語音辨識系統的效能改善”, 逢甲大學資訊工程學系碩士班碩士論文, 2005.
40.吳仲耘, “結合韻律階層及動態參數之音高預測在基於HMM之中文語音合成器”, 國立成功大學資訊工程學系碩士論文, 2008.
41.溫家誠, “多媒體應用之語音辨識系統”, 國立中央大學電機工程研究所碩士論文, 2008.
42.林佑輯, “互動式語音導覽系統”, 國立中央大學電機工程研究所碩士論文, 2010.
43.J. Zhang, Y. Zhang and Z. Huang, “A recognition algorithm without the ending-point detection of Chinese based on the DTW and HMM unified model,” Proc. IEEE International Conference on Systems, Man, and Cybernetics, San Diego, CA., vol. 5, pp. 4279-4283, October 1998.
44.R. Yaniv and D. Burshtein, “An enhanced dynamic time warping model for improved estimation of DTW parameters,” Proc. IEEE Transactions on Speech and Audio Processing (ICMIC), vol. 11, no. 3 pp. 216-268, May 2003.
45.S. A. R Al-Haddad, S. A. Samad, A. Hussain, K. A. Ishak and H. Mirvaziri, “Decision fusion for isolated Malay digit recognition using dynamic time warping (DTW) and hidden Markov model (HMM),” Proc. Student Conference on Research and Development (SCOReD 2007), Selangor, Malaysia, pp. 1-6, December 2007.
46.M. S. Sinith and K. Rajeev, “Pattern recognition in south Indian classical music using a hybrid of HMM and DTW,” International Conference on Computational Intelligence and Multimedia Applications, Sivakasi, Tamil Nadu, vol. 2, pp. 339-343, December 2007.
47.A. Smith, J. Denenberg, T. Slack, C. Tan and R. Wohlford, “Application of a sequential pattern learning system to connected speech recognition,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP ''85), vol. 10, pp. 1201-1204, April 1985.
48.H. –H. Chen and Q. –Chun Meng, “Dynamic time programming based on ant colony algorithm,” Proc. International Conference on Machine Learning and Cybernetics, Shanghai, China, vol. 6, pp. 3557-3562, August 2004.
49.Y. Xie and B. Wiltgen, “Adaptive feature based dynamic time warping,” International Journal of Computer Science and Network Security (IJCSNS), vol. 10, no. 1, pp. 264-273, 2010.
50.D. Yu, X. Yu, Q. Hu, J. Liu and A. Wu, “Dynamic time warping constraint learning for large margin nearest neighbor classification,” Information Sciences 181, pp. 2787-2796, 2011.
51.X. Chen, J. Huang, Y. Wang and C. Tao, “Incremental feedback learning methods for voice recognition based on DTW,” Proc. International Conference on Modeling, Identification and Control (ICMIC), Wuhan, Hubei, China, pp. 1011-1016, June 2012.
52.C. –H. Lee, C. –H. Lin and B. –H. Juang, “A study on speaker adaptation of the parameters of continuous density hidden Markov models,” IEEE Transactions on Signal Processing, vol. 39, no. 4, pp. 806-814, April 1991.
53.K. Asai, S. Hayamizu and K. Handa, “Dividing the distributions of HMM and linear interpolation in speech recognition,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92), San Francisco, CA., vol. 1, pp. 29-32, March 1992.
54.X. Hu and X. Wang, “Application of fuzzy data fusion in multi-sensor fire monitoring,” International Symposium on Instrumentation & Measurement, Sensor Network and Automation (IMSNA), Sanya, vol. 1, pp. 157-159, August 2012.
55.X. Zhang, H. Chen and J. Zhang, “A sintering temperature detection and control method of alumina rotary kiln based on fuzzy data fusion,” Proc. International Conference on Control, Automation, Robotics and Vision (ICARCV 2004), vol. 2, pp. 1416-1420, December 2004.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top