跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.172) 您好!臺灣時間:2025/01/15 23:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蘇柏勳
研究生(外文):Po-HsunSu
論文名稱:對話代理人中問句補全及問答句對之自動產生
論文名稱(外文):Automatic QA Pair Generation and Query Supplementation for Conversational Agents
指導教授:吳宗憲吳宗憲引用關係
指導教授(外文):Chung-Hsien Wu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:101
語文別:英文
論文頁數:59
中文關鍵詞:對話代理人對話機器人本體論關係萃取問答句產生
外文關鍵詞:Conversational AgentChatbotOntologyRelation ExtractionQuestion-Answer Generation
相關次數:
  • 被引用被引用:13
  • 點閱點閱:611
  • 評分評分:
  • 下載下載:113
  • 收藏至我的研究室書目清單書目收藏:0
近年來在對話代理人的相關研究中,產生對話之回應仍以關鍵字及樣式比對為主,且其所有對話句對之收集仍然需要大量人工的介入。本篇論文針對中文對話代理人之四個主要比對錯誤與弱點提出解決方案,分別為中文口語一句多表現方式造成的斷詞與後續的比對錯誤、口語中簡短回應所造成的回應困難、全文搜尋造成的回應延遲以及問答句對對於人工介入的大量需求等四個問題。本論文訴求於自動且大量地利用網路上所擷取的資訊自動產生問答句對,以增加對話代理人問答句對的多樣性以及覆蓋率,使其達到自動地增加知識的目的。本論文也針對中文口語中同一句話的不同表示方式進行同化與簡化,藉此在不需增加新語料的前提下提高匹配準確度。同時本論文提出歷史資訊補全及關係補全等兩種技術,以讓使用者能夠以更自然而簡單的對話方式來與對話代理人進行交流。最後本論文利用階層式文字比對的方式,使對話代理人能權衡調整匹配準確度與回應時間,以配合使用環境並讓使用者不會因回應時間過久而失去對話的興趣及耐心。
在本論文中我們對於由網路上抽取的中文句子進行了產生兩種類型問句的處理,並利用句型樣式將中文裡容易變形之口語進行文法表現形式的統一,從而達到問答句對比對效能的改進。對於口語中表現不完整的句子,則利用歷史對話記錄加以補全來得到完整句子。對於詞義相關的問句,則以中央研究院的廣義知網本體架構 (E-HowNet Ontology)來抽取詞彙之間的關係,使問答句對的文字比對召回率能夠有所提升。最後以分層的方式先辨識其屬於之主題與子主題,再於辨識得到前N個主題所包含的句子中進行細項的比對,以減少搜尋比對所需花費的時間並只犧牲些微的準確度。
The existing conversational agents based on keywords and pattern matching using manually collected corpus do not fully use the technologies in Chinese natural language processing. Currently, there are four weaknesses in Chinese conversational agents. Those are spontaneous syntactic variation in Chinese, concise utterances, response latency and limited knowledge base.
In order to solve these four problems, automatic Question Generation is employed to expand the knowledge base for enriching answering ability. The Internal Representation is used to unify and simplify the sentences for recognizing different textual utterances without adding duplicated sentences in the knowledge base. The Query Supplementation is conducted to replenish lost information of input sentences by the discourse history and semantically related words for wider matching capability. Finally, the Hierarchical Text Matching is exploited for accuracy-latency tradeoff so that the system can confront different situations flexibly and can avoid the response latency exhausting user’s patient.
This thesis proposes a question generation method to generate Yes-No and WH question-answering pairs (QA pairs) from declarative sentences. The Query Supplementation replenishes concise textual utterances both in temporal and spatial spaces using dialog history and word relations for supplementation, respectively. In History Supplementation, a user’s concise textual utterance is recovered to complete a sentence which is beneficial for text matching, directly leading to higher matching accuracy while reducing the need of multiple QA pairs for different representations of a single utterance. In Relation Supplementation, word relations extracted from the Sinica E-HowNet Ontology is exploited to enclose the similar words with the same or similar semantic meaning so that higher recall rate of answering was obtained. Finally, different from the full text search technique, Hierarchical Text Matching searches only the QA pairs in the most likely classes and an accuracy-latency tradeoff are provided. The evaluation results have shown the effectiveness of the proposed method for the four abovementioned problems.
中文摘要 ............................................... III
Abstract ............................................... V
致謝 .................................................. VII
Table of Contents ................................... VIII
List of Tables ......................................... X
List of Figures ....................................... XI
Chapter 1: Introduction ................................ 1
1.1 Introduction to the thesis ......................... 1
1.2 Background ......................................... 2
1.3 Related Work ....................................... 3
1.4Research Goal ....................................... 5
1.5 Thesis Organization ................................ 7
Chapter 2: System Framework and Related Toolsets ....... 8
2.1 System Overview .................................... 8
2.2 Wikipedia .......................................... 9
2.3 Sinica CKIP Chinese Segmentation and POS tagging .. 10
2.4 Sinica E-HowNet Ontology .......................... 12
Chapter 3: Proposed methods ........................... 14
3.1 Question Generation ............................... 14
3.1.1 Preparation for Question Generation ............. 14
3.1.2 Approach of Question Generation ................. 16
3.1.3 Feature Space Construction ...................... 19
3.2 Internal Representation ........................... 21
3.2.1 Yes-No Question Type ............................ 23
3.2.2 WH Question Type ................................ 25
3.3 Query Supplementation ............................. 27
3.3.1 History Supplementation ......................... 27
3.3.2 Relation Supplementation ........................ 34
A. Construction of Relation Matrices .................. 35
B. Relation Supplementation using Relation Matrices ... 38
3.4 Hierarchical Text Matching ........................ 39
3.4.1 Topics and Subtopics ............................ 39
3.4.2 Text Matching and Scoring Measurement ........... 41
Chapter 4: Experiment and Discussion .................. 42
4.1 Question Generation and Screening ................. 42
4.2 Internal Representation ........................... 44
4.3 Query Supplementation - History Supplementation ... 46
4.4 Query Supplementation - Relation Supplementation .. 49
4.5 Hierarchical Text Matching ........................ 50
Chapter 5: Conclusion and Future Work ................. 52
5.1 Conclusion ........................................ 52
5.2 Future Work ....................................... 52
Bibliography .......................................... 54
Appendix 1: Implementation of the proposed system ..... 58
[1] D. W. Massaro, M. M. Cohen, J. Beskow, and R. A. Cole, “Developing and evaluating conversational agents, Embodied conversational agents, MIT Press, Cambridge, MA, 2001.
[2] D. Jurafsky, J. H. Martin, and A. Kehler, Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition vol. 2: MIT Press, 2002.
[3] C. D. Manning and H. Schutze, Foundations of statistical natural language processing: MIT press, 1999
[4] R. T. Higashi, A. A. Tillack, M. Steinman, M. Harper, and C. B. Johnston, “Elder care as “frustrating and “boring: Understanding the persistence of negative attitudes toward older patients among physicians-in-training, Journal of Aging Studies, 2012.
[5] R. P. Schumaker and H. Chen, “Interaction analysis of the alice chatterbot: A two-study investigation of dialog and domain questioning, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, vol. 40, pp. 40-51, 2010
[6] T. Kluwer, “From Chatbots to Dialog Systems, Conversational Agents and Natural Language Interaction: Techniques and Effective Practices, p. 1, 2011.
[7] G. De Gasperis, “Building an AIML Chatter Bot Knowledge-Base Starting from a FAQ and a Glossary, Journal of e-Learning and Knowledge Society-English Version, vol. 6, 2010.
[8] C. R. Huang,, K. J. Chen, F. Y. Chen, W. J. Wei, and L. Chang, “Design Criteria and Content of 'Segmentation Standard for Chinese Information Processing'. Applied Linguistics. 1, pp. 92-100, 1997.
[9] W. Y. Ma and K. J. Chen, “Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff, in Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17, 2003, pp. 168-171.
[10] CKIP. “Chinese Knowledge and Information Processing Group. Retrieved from http://ckip.iis.sinica.edu.tw/CKIP/, 2004.
[11] K. J. Chen “CKIP Chinese Word Segmentation System, Chinese Knowledge and Information Processing Group, available at http://otl.sinica.edu.tw/index.php?t=9&group_id=25&article_id=408 , 2000.
[12] K. J. Chen and S. H. Liu, “Word identification for Mandarin Chinese sentences, in Proceedings of the 14th conference on Computational linguistics-Volume 1, pp. 101-107, 1992.
[13] K. J. Chen, “Lexical analysis for Chinese-difficulties and possible solutions, Journal of the Chinese institute of engineers, vol. 22, pp. 561-571, 1999.
[14] K. J. Chen and M. H. Bai, “Unknown word detection for Chinese by a corpus-based learning method, International Journal of Computational Linguistics and Chinese Language Processing, vol. 3, pp. 27-44, 1998.
[15] K. J. Chen and W. Y. Ma, “Unknown word extraction for Chinese documents, in Proceedings of the 19th international conference on Computational linguistics-Volume 1, 2002, pp. 1-7.
[16] W. Y. Ma and K. J. Chen, “Design of CKIP Chinese word segmentation system, Chinese and Oriental Languages Information Processing Society, vol. 14, pp. 235-249, 2005.
[17] CKIP. “The Analysis of Chinese Category. Technical Report 93-05, Chinese Knowledge and Information Processing Group, Academia Sinica, 1993.
[18] CKIP. “The Content and Introduction to Academia Sinica Balanced Corpus of Modern Chinese. Technical Report 95-02/98-04, Chinese Knowledge and Information Processing Group, Academia Sinica, 1998.
[19] C. R. Huang, R. Y. Chang, and S. B. Lee, “Sinica BOW (bilingual ontological wordnet): Integration of bilingual WordNet and SUMO, in Proceedings of the 4th International Conference on Language Resources and Evaluation, 2004, pp. 26-28.
[20] Academia Sinica. The Academia Sinica Bilingual Ontological Database, The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), available at http://www.aclclp.org.tw/use_bd_c.php, 2005.
[21] S. L. Huang, Y. S. Chung, and K. J. Chen, “E-HowNet-an Expansion of HowNet, in The First National HowNet Workshop, Beijing, China, 2008.
[22] S. L. Huang, Y. Y Shih, and K. J. Chen, “Knowledge Representation for Comparative Constructions in Extended-HowNet, Language and Linguistics, vol 9, No. 2, pp. 395-413, 2008.
[23] S. L. Huang and K. J. Chen, “Knowledge Representation and Sense Disambiguation for Interrogatives in E-HowNet, International Journal of Computational Linguistics & Chinese language Processing, pp. 255-278, 2008.
[24] W. T. Chen, S. C. Lin, S. L. Huang, Y. S. Chung, and K. J. Chen, “E-HowNet and automatic construction of a lexical ontology, in Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, 2010, pp. 45-48.
[25] K. J. Chen and S. L. Huang, “A Step toward Compositional Semantics: E-HowNet a Lexical Semantic Representation System, Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, pages 1-8, 2009, Plenary paper.
[26] Y. S. Chung, S. L. Huang, and K. J. Chen, “Modality and Modal Sense Representation in E-HowNet, in the Proceeding of the 21st Pacific Asia Conference on Language, Information and Computation, 2007, pp. 136-145.
[27] S. Kalady, A. Elikkottil, and R. Das, “Natural language question generation using syntax and keywords, in Proceedings of QG2010: The Third Workshop on Question Generation, 2010, pp. 1-10.
[28] 洪儷瑜(Li-Yu Hung) et.al. , “中文句型類型整理, 國科會專案研究附件資料 (NSC 95-2516-S-003-004-MY 3), 2007
[29] B. A. Shawar and E. Atwell, “Different measurements metrics to evaluate a chatbot system, in Proc. of the 2nd Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing, 2007.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊