跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/11 08:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李漪莛
研究生(外文):Yi-Ting Lee
論文名稱:基於GRU的序列對序列自動編碼器的神經元功能之分析
論文名稱(外文):Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Autoencoder
指導教授:林守德林守德引用關係
指導教授(外文):Shou-De Lin
口試委員:林智仁林軒田李宏毅陳縕儂
口試委員(外文):Chih-Jen LinHsuan-Tien LinHung-Yi LeeYun-Nung Chen
口試日期:2020-07-31
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:41
中文關鍵詞:GRU序列對序列模型自動編碼器神經元功能
外文關鍵詞:Gated Recurrent UnitSequence-to-Sequence ModelAutoencoderNeurons functionalities
DOI:10.6342/NTU202002828
相關次數:
  • 被引用被引用:0
  • 點閱點閱:173
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本文的目的在報告有關Seq2Seq模型的科學發現。眾所周知,由於RNN本質上具有遞歸機制,因此在神經元級別的分析會比分析DNN或CNN模型更具挑戰性。本文旨在提供神經元級的分析,以解釋為什麼基於單純GRU的Seq2Seq模型不需attention的機制即可成功地以很高的正確率、照順序輸出正確的token。我們發現了兩種神經元集合:存儲神經元和倒數神經元,分別存儲token和位置信息,通過分析這兩組神經元在各個時間點如何轉變以及它們的相互作用,我們可以揭開模型如何在正確位置產生正確token的機制。
The goal of this paper is to report certain scientific discoveries about a Seq2Seq model. It is known that analyzing the behavior of RNN-based models at the neuron level is considered a more challenging task than analyzing a DNN or CNN models due to their recursive mechanism in nature. This paper aims to provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model without attention can successfully output correct tokens in the correct order with a very high accuracy. We found two types of neurons set, storage neurons and count-down neurons, storing token and position information respectively. By analyzing how these two group of neurons transform through the time step and how they interact, we can uncover the mechanism of how to produce the right tokens in the right positions.
誌謝 i
摘要 ii
Abstract iii
List of Figures vi
List of Tables ix

1 Introduction 1
2 Related Works 5
3 Experiments Setup 7
3.1 Data Collection ..... 7
3.2 Model and Training details ..... 8
4 Neurons Identification Algorithm 10
4.1 Hypothesis formulation and candidate neurons generation ..... 10
4.2 Filtering ..... 12
4.3 Verification by manipulating the neuron values ..... 13
5 Hypotheses Verification 15
5.1 In each hidden states, how many neurons are storing the information of ”y_T = token_A” ? ..... 15
5.2 Do storage neurons change over different time steps? ..... 18
5.3 If the same token is to be output at different positions T, what is the relationship between the two sets of storage neurons? ..... 22
5.4 How does ht store all token information efficiently? ..... 25
5.5 Does each token have its own set of count-down neurons? ..... 27
5.6 How do count-down neurons behave? ..... 29
5.7 Why the storage neurons remain unchanged then start to change at T - k? ..... 29
5.8 How do count-down neurons affect storage neurons? ..... 33
5.9 Summary of findings ..... 36
6 Conclusion 38
Reference 39
[1] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104–3112.
[2] T. Mikolov, M. Karafiat, L. Burget, J. ´ Cernock ˇ y, and S. Khudanpur, “Recurrent neural network based language model,” in Eleventh annual conference of the international speech communication association, 2010.
[3] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[4] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[5] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end memory networks,” in Advances in neural information processing systems, 2015, pp. 2440–2448.
[6] L. Dong and M. Lapata, “Language to logical form with neural attention,” arXiv preprint arXiv:1601.01280, 2016.
[7] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau, “Building end-to-end dialogue systems using generative hierarchical neural network models,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[8] M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser, “Multi-task sequence to sequence learning,” arXiv preprint arXiv:1511.06114, 2015.
[9] T.-H. Wen, M. Gasic, D. Kim, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, “Stochastic language generation in dialogue using recurrent neural networks with convolutional sentence reranking,” arXiv preprint arXiv:1508.01755, 2015.
[10] L. Shang, Z. Lu, and H. Li, “Neural responding machine for short-text conversation,” arXiv preprint arXiv:1503.02364, 2015.
[11] L.-H. Shen, P.-L. Tai, C.-C. Wu, and S.-D. Lin, “Controlling sequence-to-sequence models-a demonstration on neural-based acrostic generator,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, 2019, pp. 43–48.
[12] A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. R. Glass, “Identifying and controlling important neurons in neural machine translation,” CoRR, vol. abs/1811.01157, 2018. [Online]. Available: http://arxiv.org/abs/1811.01157
[13] G. Weiss, Y. Goldberg, and E. Yahav, “On the practical computational power of finite precision rnns for language recognition,” arXiv preprint arXiv:1805.04908, 2018.
[14] S. Ma, X. Sun, J. Lin, and H. Wang, “Autoencoder as assistant supervisor: Improving text representation for chinese social media text summarization,” arXiv preprint arXiv:1805.04869, 2018.
[15] W. Xu, H. Sun, C. Deng, and Y. Tan, “Variational autoencoder for semisupervised text classification,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[16] A. Radford, R. Jozefowicz, and I. Sutskever, “Learning to generate reviews ´ and discovering sentiment,” CoRR, vol. abs/1704.01444, 2017. [Online]. Available: http://arxiv.org/abs/1704.01444
[17] P. Qian, X. Qiu, and X. Huang, “Analyzing linguistic knowledge in sequential model of sentence,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2016, pp. 826–835. [Online]. Available: https://www.aclweb.org/anthology/D16-1079
[18] Y. Lakretz, G. Kruszewski, T. Desbordes, D. Hupkes, S. Dehaene, and M. Baroni, “The emergence of number and syntax units in LSTM language models,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 11–20. [Online]. Available: https://www.aclweb.org/anthology/N19-1002
[19] A. Karpathy, J. Johnson, and F. Li, “Visualizing and understanding recurrent networks,” CoRR, vol. abs/1506.02078, 2015. [Online]. Available: http://arxiv.org/abs/1506.02078
[20] M. Giulianelli, J. Harding, F. Mohnert, D. Hupkes, and W. Zuidema, “Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels, Belgium: Association for Computational Linguistics, Nov. 2018, pp. 240–248. [Online]. Available: https://www.aclweb.org/anthology/W18-5426
[21] P. Koehn, “Europarl: A parallel corpus for statistical machine translation.” Citeseer.
[22] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 3319–3328.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top