(3.210.184.142) 您好!臺灣時間:2021/05/16 01:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳冠君
研究生(外文):Kuan ChunChen
論文名稱:動態神經網路 : 結合神經互信息應用於文本分類
論文名稱(外文):Dynamic Neural Networks : Apply Neural Mutual Information for Text classification
指導教授:李國榮李國榮引用關係李政德李政德引用關係
指導教授(外文):Kuo-Jung LeeCheng-Te Li
學位類別:碩士
校院名稱:國立成功大學
系所名稱:數據科學研究所
學門:數學及統計學門
學類:統計學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:39
中文關鍵詞:神經網路搜索互信息貝式定理
外文關鍵詞:Neural Architecture SearchMutual InformationBayesian Theorem
相關次數:
  • 被引用被引用:0
  • 點閱點閱:61
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
如何學習文字的特徵表示對於文本分類、對話生成等等的自然語言處理任務是非常重要的。最近有不同的神經網路模型被提出來學習文字的特徵表示,但像這樣人工設計的神經網路模型架構可以有無數種,我們並不知道哪一種模型架構是最優的。由於神經網路搜索技術的發展日益成熟,因此我們可以使用此技術來自動搜尋神經網路架構,然而,大多數的神經網路搜索技術的目標是嘗試去最大化資料集的分類準確率而不是著重在學習文字的特徵表示,因此會引發過擬合資料集的問題,所以我們結合互信息來學習文字的特徵表示,同時最大化資料集的分類準確率以及互信息進行聯合學習。我們的方法應用神經網路搜索技術於自然處理領域並結合互信息進行聯合學習,並且我們的方法在文本分類的資料集上打敗其他文本分類的模型以及能夠在資料較少的情況下有好的表現。
Learning text representation is important for text classification, text generation and other Natural Language Processing (NLP) tasks. Recently, diverse model structure has been proposed to learn text representation. But such manually designed model can have infinite combinations, so we don't know which model structure is optimal. Recently, Neural Architecture Search (NAS) techniques were developed to solve such problems. However, most of NAS techniques tried to archieve high classification accuracy of dataset instead of focusing on learning input representation which can benefit on the classification accuracy of dataset. Hence, we compute mutual information between input representation and output representation of each layer of neural network and maximize it. Through maximizing mutual information, we can learn text representation. We proposed a method which applied NAS to do model structure search and use mutual information as objective function in text classification. Our method outperforms other models in text classification and perform the state-of-art result in scarce data setting.
摘要i
Abstract ii
誌謝iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
1.1. Background . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Rule-based systems . . . . . . . . . . . . . . . 1
1.1.2. Machine learning systems . . . . . . . . . . . . 2
1.1.3. Hybrid systems . . . . . . . . . . . . . . . . . 2
1.2. Motivation . . . . . . . . . . . . . . . . . . . . 3
1.3. Problem . . . . . . . . . . . . . . . . . . . . . .3
1.4. Challenge . . . . . . . . . . . . . . . . . . . . .4
1.5. Our Method . . . . . . . . . . . . . . . . . . . . 4
1.6. Paper Structure . . . . . . . . . . . . . . . . . .5
Chapter 2. Related Work 6
2.1. Text Classification . . . . . . . . . . . . . . . .6
2.1.1. C-LSTM . . . . . . . . . . . . . . . . . . . . . 6
2.1.2. Recurrent Convolution Neural Networks . . . . . .7
2.1.3. Hierarchical Attention Network . . . . . . . . . 8
2.2. Neural Architecture Search . . . . . . . . . . . . 11
2.3. Mutual Information . . . . . . . . . . . . . . . . 12
2.4. Short Summary . . . . . . . . . . . . . . . . . . .13
Chapter 3. Methodology 14
3.1. Input Representation . . . . . . . . . . . . . . . 14
3.2. Model Search . . . . . . . . . . . . . . . . . . . 16
3.3. Selected Operations . . . . . . . . . . . . . . . .17
3.3.1. None Operation . . . . . . . . . . . . . . . . . 18
3.3.2. Convolution . . . . . . . . . . . . . . . . . . .18
3.3.3. Dilated Convolution . . . . . . . . . . . . . . .19
3.3.4. Pooling . . . . . . . . . . . . . . . . . . . . .19
3.4. Discrete Layer . . . . . . . . . . . . . . . . . . 20
3.5. Output . . . . . . . . . . . . . . . . . . . . . . 22
3.6. Joint Learning . . . . . . . . . . . . . . . . . . 22
3.7. Algorithms . . . . . . . . . . . . . . . . . . . . 23
Chapter 4. Experiment 24
4.1. Experiment Setting . . . . . . . . . . . . . . . . 24
4.2. Data Description . . . . . . . . . . . . . . . . . 24
4.2.1. IMDB . . . . . . . . . . . . . . . . . . . . . . 24
4.2.2. AG News . . . . . . . . . . . . . . . . . . . . .24
4.2.3. Yelp . . . . . . . . . . . . . . . . . . . . . . 25
4.3. Baselines . . . . . . . . . . . . . . . . . . . . .25
4.3.1. C-LSTM . . . . . . . . . . . . . . . . . . . . . 25
4.3.2. Transformer . . . . . . . . . . . . . . . . . . .26
4.3.3. ENAS . . . . . . . . . . . . . . . . . . . . . . 26
4.3.4. SMASH . . . . . . . . . . . . . . . . . . . . . .26
4.3.5. TextNAS . . . . . . . . . . . . . . . . . . . . .26
4.4. Evaluation Metric . . . . . . . . . . . . . . . . .26
4.5. Experiment Result . . . . . . . . . . . . . . . . .27
4.5.1. Compare with Baselines . . . . . . . . . . . . . 27
4.5.2. Searched Model Structue . . . . . . . . . . . . .28
4.5.3. Parameter Analysis . . . . . . . . . . . . . . . 28
4.5.4. Ablation Study . . . . . . . . . . . . . . . . . 30
Chapter 5. Conclusion and Future Work 33
References 34
Appendix A. Searched Model Structure 36
A.1. Searched model with different number nodes on IMDB . . . . . . . . . . . 36
A.2. Searched model with different number nodes on AG News . . . . . . . . . 37
A.3. Searched model with different number nodes on Yelp . . . . . . . . . . . . 38
[1] Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, and Yoshua Bengio. Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning. arXiv e-prints, page arXiv:2002.09046, February 2020.
[2] Ishmael Belghazi, Sai Rajeswar, Aristide Baratin, R. Devon Hjelm, and Aaron C.Courville. MINE: mutual information neural estimation. CoRR, abs/1801.04062, 2018.
[3] Y-Lan Boureau, J. Ponce, and Yann Lecun. A theoretical analysis of feature pooling in visual recognition. pages 111–118, 11 2010.
[4] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
[5] Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. SMASH: one-shot model architecture search through hypernetworks. CoRR, abs/1708.05344, 2017.
[6] Marti A. Hearst. Support vector machines. IEEE Intelligent Systems, 13(4):18–28, July 1998.
[7] Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Philip Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In ICLR 2019. ICLR, April 2019.
[8] Nal Kalchbrenner and Phil Blunsom. Recurrent convolutional neural networks for discourse compositionality. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pages 119–126, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
[9] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[10] Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. CoRR, abs/1806.09055, 2018.
[11] Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. Recurrent neural network for text classification with multi-task learning. CoRR, abs/1605.05101, 2016.
[12] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025, 2015.
[13] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
[14] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. pages 165–172, 10 2013.
[15] Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. CoRR, abs/1802.03268, 2018.
[16] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
[17] Yujing Wang, Yaming Yang, Yi-Ren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, and Lidong Zhou. Textnas: A neural architecture search space tailored for text representation. In AAAI, 2020.
[18] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489, San Diego, California, June 2016. Association for Computational Linguistics.
[19] Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations (ICLR), May 2016.
[20] Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. CoRR, abs/1509.01626, 2015.
[21] Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. A c-lstm neural network for text classification. 11 2015.
[22] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. CoRR, abs/1611.01578, 2016.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文