研究生(外文):Ching-Wen Cheng
論文名稱(外文):TextGraphBART: Unifying Graph and Text Generation with Structure Token
指導教授(外文):Ping-Cheng YehHung-Yi Lee
口試委員(外文):Tzong-Han TsaiYen-Lung Tsai
外文關鍵詞:Deep LearningKnowledge GraphGraph Generation
近年來生成式模型越來越受到重視,尤其是基於 Transformer 或是 Attention的模型在各個領域都有不少的成果,像是文章、音樂、圖片、影片等等。與此同時,在生成帶文字標籤的圖結構(如知識圖譜、心智圖等)上並沒有太多發展,由於該問題同時牽扯到圖結構的生成與文字標籤的生成,以往的方法大致上會分成兩種,一種是將文字與圖結構分別用兩個不同的模型,另一種則是將圖拆解成一段段的文字序列並使用序列模型來處理。然而,使用兩個模型的方法容易缺少圖結構與文字之間交互的資訊,而將圖拆解成序列的方法則是會損失部分的圖結構資訊並且將低生成效率。本論文提出了一種結構標記,能夠將圖結構與文字共同轉成單一的表示法。透過這種表示法,模型可以更有效率的學習以及生成圖結構與文字,在此之上我們也提出了一種預訓練的方法。為了證明方法的有效性,我們在兩個公開的資料集上做測試,並且結果顯示我們的方法可以用更少的參數量達到跟過去模型可比的分數。
Transformer layer has been proved to work well in several domains beyond text, like audio, image, and even multi-modal. The idea behind these models is that we can treat different kind of input as a series of tokens. Recent research also shown that with carefully designed input token, a pure transformer encoder can also be a powerful graph encoder. Taking steps further in this direction, we propose a new kind of input representation called ”Structure Token”. With structure token, we can represent graph with text label as a sequence of tokens. By converting both graph and text into structure token, we train a pure transformer encoder-decoder that learn a unified representation and generate both graph and text with the same model. We also propose a new pretrain method similar to mBART pre-training but with the structure token. In this paper, we show that with the proposed method, we are able to train a smaller model that has performance comparable to the T5 variants on text-to-graph and graph-to-text tasks.
Acknowledgements i
摘要 iii
Abstract v
Contents vii
List of Figures xi
List of Tables xiii
Chapter 1 Introduction 1
1.1 Graph Structure in Natural Language Processing 2
1.2 Common Methods for Generating Text Graph 3
1.2.1 Multi-stage Approach 3
1.2.2 Graph Linearization 4
1.3 Motivation 5
1.4 Contribution 6
1.5 Thesis Organization 6
Chapter 2 Preliminaries 9
2.1 Introduction to Generative Model 9
2.1.1 Basics of Text Generation 10
2.1.2 Basics of Graph Generation 11
2.2 Introduction to Transformer Model 11
2.2.1 The Core Module: Attention Mechanism 12
2.2.2 Handling Sequential Data: Position Embedding 14
2.3 Transformer for Graph Data 15
2.4 Improve Model Performance: Pre-Training and Fine-Tuning 17
2.5 Evaluate Generation Result 18
2.5.1 Metrics for Text Generation 19
2.5.2 Metrics for Text Graph Generation 20
Chapter 3 Method 21
3.1 Model Overview 21
3.2 The Core Design: Representing Graph via Structure Token 24
3.2.1 Problem Setup 25
3.2.2 Convert Graph/Text to Structure Token 27
3.2.3 Convert Structure Token to Vector Representation 31
3.2.4 Text Generation through Structure Token 36
3.2.5 Graph Generation through Structure Token 37
3.2.6 Efficiency of Structure Token 38
3.3 Transformer Model Training 39
Chapter 4 Experiments 41
4.1 Experiment Setup 41
4.1.1 Training Setup 41
4.1.2 Model Parameters 42
4.1.3 Datasets 42
4.1.4 Data Processing 46
4.2 Effectiveness of Structure Token on Graph-to-Text Generation 47
4.3 Effectiveness of Structure Token on Text-to-Graph Generation 48
4.4 Ablation Study 53
Chapter 5 Conclusions and Future Work 55
5.1 Summary 55
5.2 Discussion and Future Work 56
5.3 Conclusions 61
References 63
