研究生(外文):Su, Yu-Chen
論文名稱(外文):A Study of Biomedical Text Mining for Protein-protein Interaction Passage Extraction
指導教授(外文):Hsu, Wen-Lian
口試委員(外文):Chang, Yung-ChunDai, Hong-Jie
外文關鍵詞:Text MiningProtein-Protein InteractionInteraction Pattern GenerationConvolution Tree Kernel
近年來,生物醫學文獻數量的大幅增長使得自動化之關係抽取的需求更為迫切,在實體關係的種類中,蛋白質交互作用提供細胞之功能與組織結構的多樣觀點,而此類知識能解答生物途徑之分子機轉。從生醫文獻中辨識蛋白質之間是否存在交互作用的方法是在文字探勘領域時常被探討的主題之一,本研究提出一產生交互作用之模式(pattern)的模組以獲取常見的蛋白質交互作用規則,先前亦曾用於參加2015年BioCreative之競賽。本研究亦提出結合蛋白質交互作用規則和卷積樹核(Convolution Tree Kernel)的interaction pattern tree kernel以辨識蛋白質交互作用,而interaction pattern tree透過branching、pruning和ornamenting三個步驟將語法和語意的資訊結合至樹狀結構之中。本研究所提出的方法以LLL, IEPA, HPRD50, AIMed和BioInfer作為資料庫,並透過交叉驗證(cross-validation)、交叉學習(cross-learning)和跨語料庫(cross-corpus)的方式評估效能,實驗結果顯示本研究的方法有效且較數個知名的蛋白質交互作用抽取方法為佳。除此之外,本研究亦探討了數種有效的特徵(features)及建議的研究方向,或可供未來研究參考。
In recent years, the amount of biomedical literatures grows rapidly and thus the need for automated relation extraction methods becomes critical. Among all types of relations, knowledge about protein–protein interactions, including information concerning various aspects of the structural and functional organization of cells, can shed light on molecular mechanisms of biological processes. Therefore, identifying the interactions between proteins mentioned in biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper we propose PIPE, an interaction pattern generation module used in BioCreative 2015 competition to capture frequent protein-protein interaction (PPI) patterns within text. We also present an interaction pattern tree kernel method that integrates the PPI patterns with convolution tree kernel to extract protein-protein interactions, and the interaction pattern tree is constructed through three operations including branching, pruning and ornamenting. The proposed tree structure incorporates syntactic, content, and semantic information in text. Methods were evaluated on LLL, IEPA, HPRD50, AIMed, and BioInfer corpora using cross-validation, cross-learning, and cross-corpus evaluation. Empirical evaluations demonstrate that our method is effective and outperforms several well-known PPI extraction methods. Moreover, we discuss further the features that may be useful for future research.
誌謝 I
中文摘要 II
英文摘要 III
Chapter 1 Introduction 1
Chapter 2 Related Work 4
Chapter 3 Methodology 10
3.1 Candidate Sentence Generation 11
3.2 Learning Interaction Pattern from Biomedical Literature 14
3.3 Interactive Pattern Tree Construction 19
3.4 Convolution Tree Kernel 25
Chapter 4 Experiments 27
4.1 Evaluation Dataset 27
4.2 Experimental Setting and Evaluation Methods 28
4.3 Results and Discussion 30
4.4 Features for Future Research 38
Chapter 5 Concluding Remarks 42
References 43
