跳到主要內容

臺灣博碩士論文加值系統

(2600:1f28:365:80b0:90c8:68ff:e28a:b3d9) 您好!臺灣時間:2025/01/16 07:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李廣和
研究生(外文):Guang-He Lee
論文名稱:非監督式語意表徵的強化學習方法
論文名稱(外文):Unsupervised Sense Representation by Reinforcement Learning
指導教授:陳縕儂
指導教授(外文):Yun-Nung Chen
口試日期:2017-06-23
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:44
中文關鍵詞:非監督式語意表徵表徵學習加強學習
外文關鍵詞:Unsupervised Sense RepresentationRepresentation LearningReinforcement Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:1444
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文嘗試以非監督式的方法解決語意混淆問題,其語意表徵的學習必須建立在有情景的語意選擇功能之下。過往在學習語意表徵的研究多半無法兼顧表徵學習的精細度與語意選擇的效率性。本論文提出了一個模組化的架構,支持靈活的模組來優化各自的目標:一模組選擇詞對應之語意,另一模組針對選到的語意來學習表徵向量,因此達成第一個支持線性語意選擇功能的純語意階層表徵學習模型。對比於傳統的架構,我們使用了加強學習來達成了以下三種好處。一、加強學習的決策架構比起機率與分群更能描述人在選擇語意的機制;二、我們藉由加強學習的方法,提出了在模組化架構底下第一個能只用單一目標函數的非監督式語意表徵模型;三、我們更在語意選擇中引入了加強學習中多元的探索功能來增加穩健性。在基準資料集的實驗結果顯示出本論文的方法在同義字選擇與(最高餘弦相似度的)情景字相似性的實驗中都超越了最先進的方法。
This paper proposes to address the word sense ambiguity issue in an unsupervised manner, where word sense representations are learned along a word sense selection mechanism given contexts. Prior work about learning sense embeddings suffered from either coarse-grained representation learning or inefficient sense selection. The proposed modular framework implements flexible modules to optimize distinct mechanisms: sense selection and representation learning, achieving the first purely sense-level representation learning system with linear-time sense selection. In contrast to conventional methods, we leverage reinforcement learning as the learning algorithm, which exhibits the following advantages. First, the decision making process under reinforcement learning better captures the sense selection mechanism than probabilistic and clustering methods. Second, our reinforcement learning algorithm realizes the first single objective function for modular unsupervised sense representation systems. Finally, we introduce various exploration techniques under reinforcement learning on sense selection to enhance robustness. The experiments on benchmark data show that the proposed approach achieves the state-of-the-art performance on synonym selection as well as on contextual word similarities in terms of MaxSimC.
摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables vii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Word Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Sense Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background Review 6
2.1 Word Representation Learning . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Deep Learning for Natural Language Processing . . . . . . . . . . . . . . 7
2.3 Markov Decision Process and Reinforcement Learning . . . . . . . . . . 8
2.3.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Exploitation and Exploration . . . . . . . . . . . . . . . . . . . . 9
3 Methodology 11
3.1 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Sense Selection Module . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Sense Representation Module . . . . . . . . . . . . . . . . . . . 14
3.2 Joint Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 16
3.2.3 Sense Selection Strategy . . . . . . . . . . . . . . . . . . . . . . 20
3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Unstable Factor and Stabilization Methods . . . . . . . . . . . . . . . . . 22
3.3.1 Unstable Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Value Function Factorization . . . . . . . . . . . . . . . . . . . . 22
3.3.3 One-sided Optimization . . . . . . . . . . . . . . . . . . . . . . 24
4 Experiments 25
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Experiment 1: Contextual Word Similarity . . . . . . . . . . . . . . . . . 26
4.2.1 Experiment Results for Different Formulations . . . . . . . . . . 26
4.2.2 Experiment Results with the State-of-the-art . . . . . . . . . . . . 29
4.3 Experiment 2: Synonym Selection . . . . . . . . . . . . . . . . . . . . . 30
4.4 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Related Work 36
5.1 Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Probabilistic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Lexical Ontology Based Methods . . . . . . . . . . . . . . . . . . . . . 37
6 Conclusion 39
Bibliography 40
[1] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.
[2] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
[3] T. K. Landauer, Latent semantic analysis. Wiley Online Library, 2006.
[4] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119, 2013.
[5] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation.,” vol. 14, pp. 1532–1543, 2014.
[6] A. Neelakantan, J. Shankar, A. Passos, and A. McCallum, “Efficient non-parametric estimation of multiple embeddings per word in vector space,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014.
[7] J. Reisinger and R. J. Mooney, “Multi-prototype vector-space models of word meaning,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 109–117, Association for Computational Linguistics, 2010.
[8] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng, “Improving Word Representations via Global Context and Multiple Word Prototypes,” in Annual Meeting of the Association for Computational Linguistics (ACL), 2012.
[9] J. Li and D. Jurafsky, “Do multi-sense embeddings improve natural language understanding?,” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1722–1732, 2015.
[10] L. Qiu, K. Tu, and Y. Yu, “Context-dependent sense embedding,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
[11] G.-H. Lee and Y.-N. Chen, “MUSE: Modulizing unsupervised sense embeddings,” in in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2017.
[12] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Proceedings of Workshop at ICLR, 2013.
[13] E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran, “Deep neural network language models,” in Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pp. 20–28, Association for Computational Linguistics, 2012.
[14] N.-Q. Pham, G. Kruszewski, and G. Boleda, “Convolutional neural network language models,” in Proc. of EMNLP, 2016.
[15] T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur, “Recurrent neural network based language model.,” in Interspeech, vol. 2, p. 3, 2010.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” NIPS Deep Learning Workshop, 2013.
[17] A. Y. Ng and M. I. Jordan, “On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes,” Advances in neural information processing systems, vol. 2, pp. 841–848, 2002.
[18] F. Mosteller and J. W. Tukey, “Data analysis and regression: a second course in statistics.,” Addison-Wesley Series in Behavioral Science: Quantitative Methods, 1977.
[19] F. Tian, H. Dai, J. Bian, B. Gao, R. Zhang, E. Chen, and T.-Y. Liu, “A probabilistic model for learning multi-prototype word embeddings.,” in COLING, pp. 151–160, 2014.
[20] S. K. Jauhar, C. Dyer, and E. H. Hovy, “Ontologically grounded multi-sense representation learning for semantic vector space models.,” in HLT-NAACL, pp. 683–693, 2015.
[21] S. Bartunov, D. Kondrashkin, A. Osokin, and D. Vetrov, “Breaking sticks and ambiguities with adaptive skip-gram,” Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, p. 130–138, 2016.
[22] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, vol. 1. MIT press Cambridge, 1998.
[23] T. Lei, R. Barzilay, and T. Jaakkola, “Rationalizing neural predictions,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
[24] M. Kågebäck, F. Johansson, R. Johansson, and D. Dubhashi, “Neural context embeddings for automatic discovery of word senses,” in Proceedings of NAACL-HLT, pp. 25–32, 2015.
[25] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, and N. de Freitas, “Dueling network architectures for deep reinforcement learning,” 2016.
[26] C. Shaoul and C. Westbury, “The westbury lab wikipedia corpus,” 2010.
[27] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60, 2014.
[28] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
[29] T. Lei, R. Barzilay, and T. Jaakkola, “Molding cnns for text: non-linear, nonconsecutive convolutions,” 2015.
[30] P. D. Turney, “Mining the web for synonyms: Pmi-ir versus lsa on toefl,” in European Conference on Machine Learning, pp. 491–502, Springer, 2001.
[31] M. Jarmasz and S. Szpakowicz, “Roget’s thesaurus and semantic similarity,” Recent Advances in Natural Language Processing III: Selected Papers from RANLP, vol. 2003, p. 111, 2004.
[32] T. K. Landauer and S. T. Dumais, “A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.,” Psychological review, vol. 104, no. 2, p. 211, 1997.
[33] Z. Zhong and H. T. Ng, “It makes sense: A wide-coverage word sense disambiguation system for free text,” in Proceedings of the ACL 2010 System Demonstrations, pp. 78–83, Association for Computational Linguistics, 2010.
[34] G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
[35] S. Manandhar, I. P. Klapaftis, D. Dligach, and S. S. Pradhan, “Semeval-2010 task 14: Word sense induction & disambiguation,” in Proceedings of the 5th international workshop on semantic evaluation, pp. 63–68, Association for Computational Linguistics, 2010.
[36] X. Chen, Z. Liu, and M. Sun, “A unified model for word sense representation and disambiguation.,” in EMNLP, pp. 1025–1035, Citeseer, 2014.
[37] T. Vu and D. S. Parker, “K-embeddings: Learning conceptual embeddings for words using context,” in Proceedings of NAACL-HLT, pp. 1262–1267, 2016.
[38] J. Guo, W. Che, H. Wang, and T. Liu, “Learning sense-specific word embeddings by exploiting bilingual resources.,” in COLING, pp. 497–507, 2014.
[39] M. T. Pilehvar and N. Collier, “De-conflated semantic representations,” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016.
[40] S. Rothe and H. Schütze, “Autoextend: Extending word embeddings to embeddings for synsets and lexemes,” arXiv preprint arXiv:1507.01127, 2015.
[41] T. Chen, R. Xu, Y. He, and X. Wang, “Improving distributed representation of word sense via wordnet gloss composition and context clustering,” Association for Computational Linguistics, 2015.
[42] I. Iacobacci, M. T. Pilehvar, and R. Navigli, “Sensembed: Learning sense embeddings for word and relational similarity.,” in ACL (1), pp. 95–105, 2015.
[43] A. Ettinger, P. Resnik, and M. Carpuat, “Retrofitting sense-specific word vectors using parallel text,” in Proceedings of NAACL-HLT, pp. 1378–1383, 2016.
[44] S. Šuster, I. Titov, and G. van Noord, “Bilingual learning of multi-sense embeddings with discrete autoencoders,” NAACL-HLT 2016, 2016.
[45] P. Liu, X. Qiu, and X. Huang, “Learning context-sensitive word embeddings with neural tensor skip-gram model.,” in IJCAI, pp. 1284–1290, 2015.
[46] Y. Liu, Z. Liu, T.-S. Chua, and M. Sun, “Topical word embeddings.,” in AAAI, pp. 2418–2424, 2015.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top