跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2024/12/06 09:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張立暐
研究生(外文):Li-Wei Chang
論文名稱:針對資料缺失時的學習方法:只基於注意力機制且無缺失值插補之架構
論文名稱(外文):Learning with Missing Data: Attention, Not Imputation, Is All You Need
指導教授:林守德林守德引用關係
指導教授(外文):Shou-De Lin
口試委員:駱宏毅李政德
口試委員(外文):Hung-Yi LoCheng-Te Li
口試日期:2021-09-23
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:英文
論文頁數:43
中文關鍵詞:機器學習資料探勘資料缺失特徵嵌入自注意力機制
外文關鍵詞:Machine LearningData MiningMissing DataFeature EmbeddingSelf-Attention Mechanism
DOI:10.6342/NTU202104344
相關次數:
  • 被引用被引用:0
  • 點閱點閱:125
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在具有缺失值的資料上套用機器學習模型已成為許多實際應用中普遍面臨的挑戰。現有用來處理資料不完整情形的方法可能會遇到一些局限性和缺點,例如兩階段方法中的插補誤差可能會傳播擴散到下游模型進而對預測準確性產生負面影響,或由於在模型設計中的前提假設造成其缺乏彈性去適應混合類型特徵的資料等。在這篇碩士論文中,提出了一個可彈性調整的端對端下游標籤預測框架,該框架直接以存在缺失值的數據樣本作為輸入,從而避免插補錯誤對下游任務預測可能的不良影響,並透過特徵嵌入層的設計來同時支援連續和分類特徵。借助具有自注意力機制的轉換編碼器架構,該框架能夠通過捕捉特徵間的資訊與關聯性,來提升每個特徵(特別是缺失的特徵)的隱藏表徵品質。經由對性能變化的實證分析和消融研究,對於此學習框架提供了幾種有效的組件設置。最後,通過實驗與當前最先進的端對端解決方案和基於插補的方法相比,此篇論文所提出的框架達成了更好的下游標籤預測性能。
Machine learning from data with missing values has become a commonly faced challenge in many real-world applications. Existing approaches to dealing with data incompleteness may encounter some limitations and drawbacks, like possible negative affection to the downstream model caused by the propagation of imputation error in two-stage methods, lack of ability to accommodate data with mixed types of features due to some assumptions in the model design. This work proposes a flexible framework for end-to-end downstream label prediction that directly takes data samples in the presence of missing values as inputs, thus avoiding the possible bad affection to the downstream task prediction by the imputation error and supporting both continuous and categorical features with the feature embeddings layer. With the help of the transformer encoder architecture with the self-attention mechanism, the framework is enabled to improve the hidden representations of each feature (especially those missing features) by capturing inter-feature information and relationship. Several effective settings of framework components are provided through empirical analysis and ablation study on performance change. Experiments show promising results that the proposed framework achieves better downstream label prediction performance than the state-of-the-art end-to-end solution and imputation-based methods.
口試委員審定書 i
Acknowledgements iii
摘要 iv
Abstract v
Contents vii
List of Figures ix
List of Tables xi
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 The Missing Data Mechanism 5
2.2 Previous Methods Tackling Missing Data Problem 6
Chapter 3 Methodology 9
3.1 Problem Definition 9
3.2 Proposed Framework 11
3.2.1 Mapping Feature Values into Feature Embeddings 11
3.2.2 Transformer Encoder with Self-Attention Mechanism 15
3.2.3 Choices of Downstream Model Inputs 18
3.2.4 Pretraining with Auxiliary Tasks 19
Chapter 4 Experiments 22
4.1 Experiment Settings 22
4.1.1 Datasets 22
4.1.2 Baseline Methods 23
4.1.3 Training Settings 25
4.2 Ablation Study of Options of Each Component 26
4.2.1 Design of Feature Embedding Layer 26
4.2.2 Design of Transformer Encoder 27
4.2.3 Choices of Downstream Model Inputs 27
4.2.4 Choices of Pretraining Tasks 29
4.3 Evaluation of Downstream Task Prediction Problem 30
Chapter 5 Conclusion 39
References 40
M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, volume 70, pages 214–223. PMLR, 2017.
J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
L. Brieman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth Inc, 1984.
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):1–27, 2011.
D. Dua and C. Graff. UCI machine learning repository, 2017.
P. J. García-Laencina, J.-L. Sancho-Gómez, and A. R. Figueiras-Vidal. Pattern classification with missing data: a review. Neural Computing and Applications, 19(2):263–282, 2010.
L. Gondara and K. Wang. MIDA: Multiple imputation using denoising autoencoders. In PacificAsia Conference on Knowledge Discovery and Data Mining, volume 10939, pages 260–272. Springer, 2018.
D. Grangier and I. Melvin. Feature set embedding for incomplete data. In Advances in Neural Information Processing Systems, volume 23, pages 793–801, 2010.
W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1025–1035, 2017.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
U. Hwang, D. Jung, and S. Yoon. HexaGAN: Generative adversarial nets for real world classification. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 2921–2930. PMLR, 2019.
K.-Y. Kim, B.-J. Kim, and G.-S. Yi. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinformatics, 5(1):1–9, 2004.
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (Poster), 2015.
S. C.-X. Li, B. Jiang, and B. Marlin. MisGAN: Learning from incomplete data with generative adversarial networks. In International Conference on Learning Representations, 2019.
R. J. Little and D. B. Rubin. Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019.
N. Lopes. Handling missing values via a neural selective input model. Neural Network World, 22:357–370, 01 2012.
R. Mazumder, T. Hastie, and R. Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, 11:2287–2322, 2010.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
K. Pelckmans, J. De Brabanter, J. A. Suykens, and B. De Moor. Handling missing values in support vector machine classifiers. Neural Networks, 18(56): 684–692, 2005.
M. Śmieja, L. u. Struski, J. Tabor, B. Zieliński, and P. a. Spurek. Processing of missing data by neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, volume 31, page 2724–2734, 2018.
G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein. SAINT: Improved neural networks for tabular data via row attention and contrastive pretraining. arXiv preprint arXiv:2106.01342, 2021.
D. J. Stekhoven and P. Bühlmann. Missforest—nonparametric missing value imputation for mixed-type data. Bioinformatics, 28(1):112–118, 2012.
S. Van Buuren and K. Groothuis-Oudshoorn. mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(1):1–67, 2011.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine learning, pages 1096–1103, 2008.
D. Williams, X. Liao, Y. Xue, and L. Carin. Incomplete-data classification using logistic regression. In Proceedings of the 22nd International Conference on Machine Learning, pages 972–979, 2005.
J. Xia, S. Zhang, G. Cai, L. Li, Q. Pan, J. Yan, and G. Ning. Adjusted weight voting algorithm for random forests in handling missing values. Pattern Recognition, 69:52–60, 2017.
J. Yoon, J. Jordon, and M. Schaar. GAIN: Missing data imputation using generative adversarial nets. In International Conference on Machine Learning, pages 5689–5698. PMLR, 2018.
J. You, X. Ma, Y. Ding, M. J. Kochenderfer, and J. Leskovec. Handling missing data with graph representation learning. In Advances in Neural Information Processing Systems, volume 33, pages 19075–19087, 2020.
H.-F. Yu, N. Rao, and I. S. Dhillon. Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in Neural Information Processing Systems, volume 29, pages 847–855, 2016.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊