跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.86) 您好!臺灣時間:2025/01/14 11:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:楊長睿
研究生(外文):Chang-Rui Yang
論文名稱:中文時態自動標記及其在因果論元偵測之應用
論文名稱(外文):Automatic Chinese Tense Tagging and Its Application to Causal Effect Detection
指導教授:陳信希陳信希引用關係
指導教授(外文):Hsin-Hsi Chen
口試委員:古倫維林川傑馬偉雲
口試委員(外文):Lun-Wei KuChuan-Jie LinWei-Yun Ma
口試日期:2016-07-21
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:51
中文關鍵詞:因果推論因果分析
外文關鍵詞:Causal Analysis
相關次數:
  • 被引用被引用:0
  • 點閱點閱:339
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
因果分析在自然語言處理扮演重要的角色,其應用如事件抽取、因果推論、和問題回答。本論文探討時態資訊在中文因果分析中的作用,提出一套半監督式模型自動標記中文時態,並將時態資訊作為特徵,應用於因果篇章分類與因果論元識別。
英文有文法上的時態資訊,透過動詞變化與助詞組合來呈現不同時態。然而,中文的動詞並不會因時態不同而有所改變,必須依靠周圍的搭配詞判斷,因此在預測上較英文困難。在本論文中,我們提出一個半監督式的學習策略,從UM-Corpus中英平行語料庫中,制定規則從英文端取得時態,再經由中英詞彙階層對齊,將時態標記於中文端上。自動產生大量具有時態標記的中文語料之後,我們以此提升依存卷積神經網路模型用於中文時態預測的效能。
最後,我們將時態分類模型所預測出的時態做為特徵加入因果類篇章關係分類實驗、與因果類篇章原因與結果識別實驗,同時也使用人工標記的正確時態分析時間特徵為分類所帶來的影響。實驗結果顯示,句間的時態轉變對於不同的因果關係,存在不同的使用習慣,加入時態資訊為特徵之後,顯著地提升因果篇章分類與因果論元識別之準確率。


Causal analysis is an attractive topic in natural language processing and can be aplied in a variety of tasks such as event extaction, causality inference, and question-answering. This thesis explores the role of tense information in Chinese causal analysis. A semi-supervised approach is proposed for Chinese tense labelling. Both tasks of causal type classification and causal directionality identififcation are experimented to show the significant improvement gained from tense features.
Unlike English, which has grammatical tense information, it is more challenging to predict the tense of a Chinese verb. Based on English-Chinese parallel data from UM-Corpus, we propose an approach that automatically aligns the tense information from English sentences to their Chinese counterparts. The large amount of pseudo-labelled Chinese tense instances are used to train the Chinese tense predictor. Our semi-supervised approach improves the dependency-based convolutional neural network (DCNN) models for Chinese tense labelling.
Finally, the Chinese tense information is used as features for the tasks of casual type classification and causal directionality identification. Experimental results show the tense features significantly improve the performances of both tasks.


誌謝 ii
摘要 iii
Abstract iv
目錄 v
圖目錄 vii
表目錄 ix
第一章、緒論 1
1.1因果分析 1
1.2中文時態預測 2
1.3論文架構 3
第二章、相關研究 4
2.1中文時態語料庫 4
2.2中文時態預測 5
2.3篇章關係語料庫 7
第三章、中文時態語料集標記 9
3.1建立規則從英文端取得時態資訊 9
3.2透過雙語對應連接中文英文單詞 14
3.3標記中文端時態 15
第四章、中文時態語料庫分析 17
4.1自動標記中文時態語料庫 17
4.2人工標記中文時態語料庫 21
4.3使用特徵 22
4.3.1詞彙特徵 22
4.3.2依存關係特徵 23
4.4分類模型 24
4.4.1傳統機器學習模型(traditional machine learning model) 24
4.4.2深層學習模型(Deep learning model) 25
4.5時態分類實驗 26
4.5.1交叉驗證自動生成資料集 27
4.5.2非監督式學習 27
4.5.3半監督式學習 30
第五章、因果篇章分類與因果論元識別 33
5.1運用時態提升因果篇章分類 34
5.2運用時態增進因果論元識別 39
第六章、結論及未來研究 48
參考文獻 49



Manning, Christopher D. Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit .In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
Tao Ge, Heng Ji, Baobao Chang and Zhifang Sui. 2015. One Tense per Scene: Predicting Tense in Chinese Conversations. Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pages 668–673.
Features. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 987–997, Baltimore, Maryland, USA.
Chikara Hashimoto, Kentaro Torisawa, Julien Kloetzer, Motoki Sano, Istvan Varga, Jong-Hoon Oh, and Yutaka Kidawara. 2014. Toward Future Scenario Generation: Extracting Event Causality Exploiting Semantic Relation, Context, and Association
Chikara Hashimoto, Kentaro Torisawa, Stijn De Saeger, Jong-Hoon Oh, and Jun’ichi Kazama. 2012. Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 619–630, Jeju Island, Korea.
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, October 25-29.
Yancui Li, Wenhe Feng, Jing Sun, Fang Kong and Guodong Zhou. 2014. Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure. Empirical Methods in Natural Language Processing (EMNLP), pages 2105–2114.
Feifan Liu, Fei Liu and Yang Liu.2011. Learning from Chinese-English Parallel Data for Chinese Tense Prediction. International Joint Conference on Natural Language Processing, pages 1116–1124.
Mingbo Ma, Liang Huang, Bing Xiang and Bowen Zhou.2015 .Dependency-based Convolutional Neural Networks for Sentence Embedding. Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pages 174–179.
William Mann and Sandra Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243-281.
Dong Nguyen, Tijs van den Broek, Claudia Hauff, Djoerd Hiemstra, and Michel Ehrenhard. 2015. #SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2570–2576, Lisbon, Portugal.
Jong-Hoon Oh, Kentaro Torisawa, Chikara Hashimoto, Motoki Sano, Stijn De Saeger, and Kiyonori Ohtake. 2013. Why-Question Answering using Intra- and Inter-Sentential Causal Relations. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1733–1743, Sofia, Bulgaria.
Mari Olson, David Traum, Carol Vaness Dykema, Amy Weinberg, and Ron Dolan. 2000. Telicity as a cue to temporal and discourse structure in Chinese-English Machine Translation. In Proceedings of NAACLANLP 2000 Workshop on Applied interlinguas: practical applications of interlingual approaches to NLP, pages 34–41, Seattle Washington.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. JMLR 12, pp. 2825-2830.
Rashmi Prasad, Alan Lee, Nikhil Dinesh, Eleni Miltsakaki, Geraud Campion, Aravind Joshi, and Bonnie Webber. 2008. Penn Discourse Treebank Version 2.0 LDC2008T05. Web Download.Philadelphia: Linguistic Data Consortium, 2008. https://catalog.ldc.upenn.edu/LDC2008T05
Shohei Tanaka, Naoaki Okazaki, and Mitsuru Ishizuka. 2012. Acquiring and Generalizing Causal Inference Rules from Deverbal Noun Constructions. In Proceedings of COLING 2012: Posters, pages 1209–1218, COLING 2012, Mumbai, India.
Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, Longyue Wang.2014. UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC''14), Reykjavik, Iceland
Fuyi Xing. 2003. Research of Chinese complex sentence. The Commercial Press, Beijing, CN (in Chinese)
Nianwen Xue. 2008. Automatic inference of the temporal location of situations in Chinese text. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 707–714.
Nianwen Xue, Hua Zhong and Kai-Yun Chen.2008. Annotating “tense” in a tense-less language. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Marrakech, Morocco.
Nianwen Xue and Yuchen Zhang. 2014. Buy one get one free: Distant annotation of Chinese tense, event type, and modality. Language Resources and Evaluation Conference, pages 1412–1416.
Yang Ye. 2007. Automatica Tense and Aspect Translation between Chinese and English. Ph.D. thesis, University of Michigan.
Yuchen Zhang and Nianwen Xue. 2014. Automatic Inference of the Tense of Chinese Events Using Implicit Linguistic Information. Empirical Methods in Natural Language Processing (EMNLP), pages 1902–1911.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top