論文名稱(外文):Automatic QA Pair Generation and Query Supplementation for Conversational Agents
指導教授(外文):Chung-Hsien Wu
外文關鍵詞:Conversational AgentChatbotOntologyRelation ExtractionQuestion-Answer Generation
在本論文中我們對於由網路上抽取的中文句子進行了產生兩種類型問句的處理,並利用句型樣式將中文裡容易變形之口語進行文法表現形式的統一,從而達到問答句對比對效能的改進。對於口語中表現不完整的句子,則利用歷史對話記錄加以補全來得到完整句子。對於詞義相關的問句,則以中央研究院的廣義知網本體架構 (E-HowNet Ontology)來抽取詞彙之間的關係,使問答句對的文字比對召回率能夠有所提升。最後以分層的方式先辨識其屬於之主題與子主題,再於辨識得到前N個主題所包含的句子中進行細項的比對,以減少搜尋比對所需花費的時間並只犧牲些微的準確度。
The existing conversational agents based on keywords and pattern matching using manually collected corpus do not fully use the technologies in Chinese natural language processing. Currently, there are four weaknesses in Chinese conversational agents. Those are spontaneous syntactic variation in Chinese, concise utterances, response latency and limited knowledge base.
In order to solve these four problems, automatic Question Generation is employed to expand the knowledge base for enriching answering ability. The Internal Representation is used to unify and simplify the sentences for recognizing different textual utterances without adding duplicated sentences in the knowledge base. The Query Supplementation is conducted to replenish lost information of input sentences by the discourse history and semantically related words for wider matching capability. Finally, the Hierarchical Text Matching is exploited for accuracy-latency tradeoff so that the system can confront different situations flexibly and can avoid the response latency exhausting user’s patient.
This thesis proposes a question generation method to generate Yes-No and WH question-answering pairs (QA pairs) from declarative sentences. The Query Supplementation replenishes concise textual utterances both in temporal and spatial spaces using dialog history and word relations for supplementation, respectively. In History Supplementation, a user’s concise textual utterance is recovered to complete a sentence which is beneficial for text matching, directly leading to higher matching accuracy while reducing the need of multiple QA pairs for different representations of a single utterance. In Relation Supplementation, word relations extracted from the Sinica E-HowNet Ontology is exploited to enclose the similar words with the same or similar semantic meaning so that higher recall rate of answering was obtained. Finally, different from the full text search technique, Hierarchical Text Matching searches only the QA pairs in the most likely classes and an accuracy-latency tradeoff are provided. The evaluation results have shown the effectiveness of the proposed method for the four abovementioned problems.
中文摘要 ............................................... III
Abstract ............................................... V
致謝 .................................................. VII
Table of Contents ................................... VIII
List of Tables ......................................... X
List of Figures ....................................... XI
Chapter 1: Introduction ................................ 1
1.1 Introduction to the thesis ......................... 1
1.2 Background ......................................... 2
1.3 Related Work ....................................... 3
1.4Research Goal ....................................... 5
1.5 Thesis Organization ................................ 7
Chapter 2: System Framework and Related Toolsets ....... 8
2.1 System Overview .................................... 8
2.2 Wikipedia .......................................... 9
2.3 Sinica CKIP Chinese Segmentation and POS tagging .. 10
2.4 Sinica E-HowNet Ontology .......................... 12
Chapter 3: Proposed methods ........................... 14
3.1 Question Generation ............................... 14
3.1.1 Preparation for Question Generation ............. 14
3.1.2 Approach of Question Generation ................. 16
3.1.3 Feature Space Construction ...................... 19
3.2 Internal Representation ........................... 21
3.2.1 Yes-No Question Type ............................ 23
3.2.2 WH Question Type ................................ 25
3.3 Query Supplementation ............................. 27
3.3.1 History Supplementation ......................... 27
3.3.2 Relation Supplementation ........................ 34
A. Construction of Relation Matrices .................. 35
B. Relation Supplementation using Relation Matrices ... 38
3.4 Hierarchical Text Matching ........................ 39
3.4.1 Topics and Subtopics ............................ 39
3.4.2 Text Matching and Scoring Measurement ........... 41
Chapter 4: Experiment and Discussion .................. 42
4.1 Question Generation and Screening ................. 42
4.2 Internal Representation ........................... 44
4.3 Query Supplementation - History Supplementation ... 46
4.4 Query Supplementation - Relation Supplementation .. 49
4.5 Hierarchical Text Matching ........................ 50
Chapter 5: Conclusion and Future Work ................. 52
5.1 Conclusion ........................................ 52
5.2 Future Work ....................................... 52
Bibliography .......................................... 54
Appendix 1: Implementation of the proposed system ..... 58
