跳到主要內容

臺灣博碩士論文加值系統

(34.204.180.223) 您好!臺灣時間:2021/08/01 16:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳嘉明
研究生(外文):Chia-MingChen
論文名稱:仿人類記憶模式語言模型學習之大詞彙語音辨識與聲學文詞後處理之客製化對話系統
論文名稱(外文):Customized Spoken Dialogue System Based on LVCSR with Human Memory-Like Language Model Learning and Prosodic-Contextual Post-processing
指導教授:王駿發
指導教授(外文):Jhing-Fa Wang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:43
中文關鍵詞:大詞彙語音辨識客製化對話系統仿人類記憶語言模型學習語音辨識後處理
外文關鍵詞:LVCSRcustomized spoken dialogue systemhuman memory-like language model learningpost-processing of speech recognition
相關次數:
  • 被引用被引用:0
  • 點閱點閱:221
  • 評分評分:
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:1
隨著科技的進步,人機介面的方便性愈漸重要。相較於傳統的鍵盤滑鼠,利用深度與影像偵測的體感攝影機KINECT,以及iPhone 4S上的行動語音秘書Siri逐漸取而代之。但傳統語音對話系統在大詞彙語音辨識部份一直存在著語料庫無法隨著時間與新知進步和辨識率較低的問題;以及在對話處理方面無法針對使用者的需求和習慣語句來辨識與採取相對應的回饋。因此,本篇論文針對其來做改進:1.觀察人類的記憶學習模式,將大詞彙語音辨識的語言模型(Language Model)劃分為短期語言模型以及長期語言模型。希望藉此模擬人類在記憶時模式,針對緊急的事件將其置於短期記憶中,而重要的事件置於長期記憶。2.為了提升辨識率,對語音訊號做分析(prosodic analysis),利用自相關函式(ACF)以及音節的平均長度來分析語音訊號,藉此得到語音訊號中音節數與音高變化;接著對語音辨識結果做分析(contextual analysis),利用詞性的出現頻率以及不同詞性間出現頻率的bigram模型分析辨識結果的語調,藉此與語音訊號之音高變化資訊做比較來提升語音辨識率。3.設計客製化使用者介面,連結關鍵字與對話決策樹之決策節點。在實驗的部份,針對後處理後的辨識結果,約可降低6.4%字錯率;而觀察經過了長期學習後的辨識結果,約可降低約27.55%的字錯率。
With the development of technology, human-machine interface becomes more important. Compare with traditional interfaces, a spoken dialogue system is more convenient. However, there are some disadvantages about the LVCSR (Large Vocabulary Continuous Speech Recognition) and dialogue system. For example, the corpora of LVCSR can’t evolve with time and information; besides, the recognition rate of LVCSR is lower and the dialogue system is unable to serve the requirement of different keywords for people. Consequently, we focus on three parts to improve the whole system. Firstly, by observing the memory modes of human being, a human memory-like language model is proposed to learn urgent corpus in short-term language model and important corpus in long-term language model. Secondly, to increase the recognition rate, a prosodic-contextual post-processing is built. Thirdly, a customization user interface is designed to serve the requirement of different keywords for people. Last but not least, the experiments of the accuracy of prosodic-contextual post-processing and the recognition results of LVCSR are evaluated by word error rate (WER). The average WER of experimental result on prosodic-contextual post-processing is improved by 6.4%, and the average WER with long-term learning of eight persons is improved by 27.55%. With the previous experimental results, the proposed system is proved efficient for human-machine interface.
摘要 I
ABSTRACT II
誌謝 IV
CONTENT V
FIGURE LIST VII
TABLE LIST VIII
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND 1
1.2 MOTIVATION 2
1.3 RELATED WORKS 3
1.4 THESIS OBJECTIVES 4
CHAPTER 2 SYSTEM OVERVIEW 6
2.1 FRAMEWORK OF THE PROPOSED SYSTEM 6
2.2 HUMAN MEMORY-LIKE LANGUAGE MODEL LEARNING METHOD 7
2.3 CUSTOMIZED DIALOGUE SYSTEM DESIGN 8
CHAPTER 3 HUMAN MEMORY-LIKE LANGUAGE MODEL LEARNING METHOD 9
3.1 PREPARATION FOR ACOUSTIC MODEL 9
3.1.1 Voice Activity Detection 9
3.1.2 Feature Extraction 10
3.1.3 Acoustic Model Training 13
3.2 PREPARATION FOR LANGUAGE MODEL 14
3.2.1 Pre-processing of Text Corpora 15
3.2.2 Language Model Training 16
3.3 SPEECH RECOGNITION SEARCH ALGORITHM 17
3.4 PROSODIC-CONTEXTUAL POST-PROCESSING 17
3.4.1 Prosodic Analysis 18
3.4.2 Contextual Analysis 20
3.4.3 Candidates Scoring and Selection 22
3.5 HUMAN MEMORY-LIKE LANGUAGE MODEL 24
CHAPTER 4 CUSTOMIZED SPOKEN DIALOGUE SYSTEM 27
4.1 CUSTOMIZATION INTERFACE DESIGN 27
4.2 DIALOGUE MANAGEMENT AND STRATEGY 29
CHAPTER 5 EXPERIMENTS 33
5.1 EXPERIMENTAL ENVIRONMENT 33
5.1.1 Experimental Corpora 33
5.1.2 Experimental Tools 35
5.2 EXPERIMENTAL RESULTS AND COMPARISON 35
5.2.1 Experiment on LVCSR and Prosodic-Contextual Post-Processing 36
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 39
6.1 CONCLUSIONS 39
6.2 FUTURE WORKS 40
REFERENCES 41
作者簡介 43

[1] Shuanhu Bai and Haizhou Li, “Bayesian learning of N-gram statistical language modeling, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Toulouse, France, 2006, May. 14-19, pp. 1045-1048.
[2] Shuanhu Bai, C.L. Huang, Y.K. Tan and Bin Ma, “Language modeling learning for domain-specific natural language user interaction, in Proc. IEEE Int. Conf. Robotics and Biomimetrics, Guilin, China, 2009, Dec. 19-23, pp. 2480-2485.
[3] Jun Jiang and Lei Li, “ASR post-processing correction based on NER and pronunciation primitive, in Proc. 7th Int. Conf. Natural Language Processing and Knowledge Engineering, Tokushima, Japan, 2011, Nov. 27-29, pp. 126-131.
[4] Shuanhu Bai, Min Zhang, and Haizhou Li, “Semi-supervised learning of domain-specific language models from general domain data, in Proc. Int. Conf. Asian Language Processing, Singapore, 2009, Dec. 7-9, pp. 273-279.
[5] Y.X. Li, and L.I. Chew, “Influence of language models and candidate set size on contextual post-processing for chinese script recognition, in Proc. Int. Conf. Pattern Recognition, Cambridge, UK, 2004, August. 23-26, pp. 537-540.
[6] Ridong Jiang, Y.K. Tan, and C.Y. Wong, “Development of event-driven dialogue system for social mobile robot, in Proc. Global Conf. Intelligence Systems, Xiamen, China, 2009, May. 19-21, pp. 117-121.
[7] Tiziana Ligorio, Susan L. Epstein, and Rebecca J. Passonneau, “Wizard dialogue strategies to handle noisy speech recognition, in Proc. IEEE Workshop on Spoken Language Technology, Berkeley, California, USA, 2010, Dec. 12-15, pp. 318-323.
[8] Teruhisa Misu, Komei Sugiura, Kiyonori Ohtake, Chiori Hori, Hideki Kashioka, Hisashi Kawai, and Satoshi Nakamura, “Dialogue strategy optimization to assist user’s decision for spoken consulting dialogue systems, in Proc. IEEE Workshop on Spoken Language Technology, Berkeley, California, USA, 2010, Dec. 12-15, pp. 354-359.
[9] C.L. Huang, and C-H Wu, “Phone set generation based on acoustic and contextual analysis for multilingual speech recognition, in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Honolulu, Hawaii, USA, 2007, April 15-20, pp. 1017-1020.
[10] Xiuqin Pan, Yongcun Cao, Yong Lu, and Yue Zhao, “Tibetan language speech recognition model based on active learning and semi-supervised learning, in Proc. 10th IEEE Int. Conf. Computer and Information, Bradford, UK, 2010, Jun. 29-Jul. 1, pp. 1225-1228.
[11] Xin Li, Jielin Pan, Yonghong Yan, and Yafei Yang, “Large vocabulary Uyghur continuous speech recognition based on stems and suffixes, in Proc. 7th Int. Symposium on Chinese Spoken Language Processing, Sun Moon Lake, Taiwan, 2010, Nov. 29-Dec. 3, pp. 220-223.
[12] Reinhard Kenser, and Hermann Ney, “Improved backing-off for M-gram language model, in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Detroit, Michigan, 1995, May. 9-12, pp. 181-184.
[13] Mark Gales, and Phil Woodland, “Recent progress in large vocabulary continuous speech recognition: An HTK perspective, Cambridge University Engineering Department, 2006, May. 15.
[14] Andreas Stolcke, “SRILM-An extensible language modeling toolkit, Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA.
[15] Andreas Stolcke, Jing Zheng, Wen Wang, and Victor Abrash, “SRILM at sixteen: Updata and outlook, Microsoft Speech Labs, Mountain View, California, USA.
[16] Maozu Guo, Yang Liu, and Jacek Malec “A new Q-learning Algorithm Based on the Metropolis Criterion, IEEE Transaction on Systems, Man, and Cybernetics.
[17] Esther Levin, Roberto Pieraccini and Wieland Eckert, “A stochastic model of human-machine interaction for learning dialogue strategy, IEEE Transactions on Speech and Audio Processing.
[18]蔡金翰,語音對話系統和對話策略之研究,國立交通大學電信工程學系,碩士論文,2005,07
[19]朱育德,“基於字詞內容之適應性對話系統, 國立中央大學資訊工程研究所,碩士論文,2006,07
[20]張弘霖,“基於位置特定事後機率詞圖及潛藏與異分析之語音文件檢索Spoken Document Retrieval Based on Position Specific Posterior Lattices and Latent Semantic Analysis,國立台灣大學電機資訊學院資訊工程學系,碩士論文
[21]陳怡婷,黃耀民,葉耀明, 陳柏琳,“中文語音文件自動摘要之摘要模型
[22] http://en.wikipedia.org/wiki/Dijkstra's_algorithm
[23] http://en.wikipedia.org/wiki/Forgetting_curve
[24] http://en.wikipedia.org/wiki/ACF
[25]蔡如意,“Continuous Lexical Representation of Language Model for Speech Recognition, 國立成功大學資訊學系, 碩士論文,2008,06
[26]陳冠宇,Improved Topic Modeling Techniques for Speech Recognition, 國立台灣師範大學資訊工程研究所, 2010, 08

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top