研究生(外文):Yi-Lun Wang
論文名稱(外文):A Chinese Reading Comprehension and Question Answering System Based on Attention Mechanism and Convolutional Neural Networks
指導教授(外文):Chin-Shyurng Fahn
口試委員(外文):Chiou-Shann FuhSheng-Jyh WangKuan-Yu Chen
外文關鍵詞:Chinese Machine Reading ComprehensionNatural Language ProcessingAttention MechanismConvolutional Neural NetworkDeep LearningFast ConvergenceLess Memory Usage
  在主要的架構設計上,我們放棄使用傳統循環神經網路架構(Recurrent Neural Network, RNN),而是採用了最近流行的自我注意力機制(Self -Attention)且與卷積神經網路(Convolutional Neural Network, CNN)相結合,如此能夠更加有效地節省訓練時間。另外,在相互作用層,我們使用二次的文章與問題之間交互注意力機制(Context-Query Attention),改善文章與問題之間交互關係的計算,使得模型能更快速且有效地在文章中取得與問題有關係的資訊,迅速達到模型的收斂。
  在實驗的過程,我們使用台達閱讀理解資料集(Delta Reading Comprehension Dataset, DRCD) 作為在中文環境下的主要研究對象。在評分方面則是使用精確匹配分數(Exact Match, EM)與模糊匹配分數(F1)兩種計算方法,最終我們的模型在使用相對較少記憶體的Titan XP顯示卡下,花費訓練時間約1小時即可達到EM 65%與F1 79%的中文閱讀理解準確率,此結果比其它擁有類似架構的模型大約快3倍。
  There are many different topics of research in deep learning and natural language processing projects, and one of the most popular issues is machine reading comprehension for questions answering. For human questions, the computer can search and extract the answers to the questions from the provided articles in advance, and has a great amount of applications in the field of robots and intelligent personal assistants. However, the model architecture published in recent years has become huger with the advancement of time, resulting in a lot of resources in training and applications.
  In order to overcome the above problems, this thesis proposes a new reading comprehension deep learning model in Chinese environment. Training can be performed using a general-level GPU, and convergence can be achieved in a short time. In the pre-processing part, we use the existing Chinese text segmentation package and a pre-trained word embedding dictionary. We also provide each single character embedding vector as an additional model input, so that the model can obtain more information and prevent text segmentation error situation.
  In the main architectural design, we abandon the use of traditional Recurrent Neural Network (RNN) but adopt the recent popular Self-Attention and Convolutional Neural Network (CNN) which can save training time effectively. At the interaction layer, we use two times Context-Query Attention to enhance the interaction calculation between the article and the questions, so that the model can acquire the information related to questions in the article more effectively and reach the convergence faster.
  In the experiment, we adopt the Delta Reading Comprehension Dataset (DRCD) as the main test data in the Chinese environment. In terms of scoring, the Exact Match score (EM) and F1 score are used. The experimental results reveal that our model is able to reach the accuracy of 65% for EM and 79% for F1 whose training time is less than 1 hour using the Titan XP GPU that possesses less memory. Its performance is about 3 times faster than other models with similar architectures.
List of Figures................................................vii
List of Tables................................................viii
Chapter 1 Introduction..........................................1
1.1 Overview....................................................1
1.2 Motivation..................................................3
1.3 System Description..........................................4
1.4 Thesis Organization.........................................6
Chapter 2 Related Work..........................................7
2.1 RNN-based Reading Comprehension.............................8
2.2 Attention-based Reading Comprehension......................10
2.3 Recently Popular Huge Architecture.........................12
Chapter 3 Natural Language Processing and Deep Learning........14
3.1 Tokenization and Embedding.................................14
3.2 Convolutional Neural Network (CNN).........................16
3.2.1 Ordinary convolution.....................................16
3.2.2 Depthwise separable convolution..........................18
3.3 Recurrent Neural Network (RNN).............................19
3.4 Attention Mechanism........................................21
3.4.1 Context-query attention..................................22
3.4.2 Self-attention...........................................23
Chapter 4 Machine Reading Comprehension Model..................25
4.1 Input Pre-processing Layer.................................26
4.2 Input Encoding Layer.......................................28
4.3 Interaction Layer..........................................29
4.4 Model Encoding Layer.......................................30
4.5 Output Layer...............................................30
Chapter 5 Experimental Results and Discussion..................32
5.1 Experimental Setup.........................................32
5.2 Test on Stanford Question Answering Dataset................33
5.3 Test on Delta Reading Comprehension Dataset................36
Chapter 6 Conclusions and Future Work..........................40
6.1 Conclusions................................................40
6.2 Future Work................................................42
