研究生(外文):Li-Wei Chang
論文名稱(外文):Learning with Missing Data: Attention, Not Imputation, Is All You Need
指導教授(外文):Shou-De Lin
口試委員(外文):Hung-Yi LoCheng-Te Li
外文關鍵詞:Machine LearningData MiningMissing DataFeature EmbeddingSelf-Attention Mechanism
Machine learning from data with missing values has become a commonly faced challenge in many real-world applications. Existing approaches to dealing with data incompleteness may encounter some limitations and drawbacks, like possible negative affection to the downstream model caused by the propagation of imputation error in two-stage methods, lack of ability to accommodate data with mixed types of features due to some assumptions in the model design. This work proposes a flexible framework for end-to-end downstream label prediction that directly takes data samples in the presence of missing values as inputs, thus avoiding the possible bad affection to the downstream task prediction by the imputation error and supporting both continuous and categorical features with the feature embeddings layer. With the help of the transformer encoder architecture with the self-attention mechanism, the framework is enabled to improve the hidden representations of each feature (especially those missing features) by capturing inter-feature information and relationship. Several effective settings of framework components are provided through empirical analysis and ablation study on performance change. Experiments show promising results that the proposed framework achieves better downstream label prediction performance than the state-of-the-art end-to-end solution and imputation-based methods.
口試委員審定書 i
Acknowledgements iii
摘要 iv
Abstract v
Contents vii
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 The Missing Data Mechanism 5
2.2 Previous Methods Tackling Missing Data Problem 6
Chapter 3 Methodology 9
3.1 Problem Definition 9
3.2 Proposed Framework 11
3.2.1 Mapping Feature Values into Feature Embeddings 11
3.2.2 Transformer Encoder with Self-Attention Mechanism 15
3.2.3 Choices of Downstream Model Inputs 18
3.2.4 Pretraining with Auxiliary Tasks 19
Chapter 4 Experiments 22
4.1 Experiment Settings 22
4.1.1 Datasets 22
4.1.2 Baseline Methods 23
4.1.3 Training Settings 25
4.2 Ablation Study of Options of Each Component 26
4.2.1 Design of Feature Embedding Layer 26
4.2.2 Design of Transformer Encoder 27
4.2.3 Choices of Downstream Model Inputs 27
4.2.4 Choices of Pretraining Tasks 29
4.3 Evaluation of Downstream Task Prediction Problem 30
Chapter 5 Conclusion 39
References 40
