跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.175) 您好!臺灣時間:2024/12/10 15:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蘇宇辰
研究生(外文):Su, Yu-Chen
論文名稱:以生物文獻探勘技術辨識蛋白質互動片段之研究
論文名稱(外文):A Study of Biomedical Text Mining for Protein-protein Interaction Passage Extraction
指導教授:許聞廉許聞廉引用關係
指導教授(外文):Hsu, Wen-Lian
口試委員:張詠淳戴鴻傑
口試委員(外文):Chang, Yung-ChunDai, Hong-Jie
口試日期:2017-1-13
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:52
中文關鍵詞:文字探勘蛋白質交互作用交互作用模式卷積樹核
外文關鍵詞:Text MiningProtein-Protein InteractionInteraction Pattern GenerationConvolution Tree Kernel
相關次數:
  • 被引用被引用:0
  • 點閱點閱:267
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,生物醫學文獻數量的大幅增長使得自動化之關係抽取的需求更為迫切,在實體關係的種類中,蛋白質交互作用提供細胞之功能與組織結構的多樣觀點,而此類知識能解答生物途徑之分子機轉。從生醫文獻中辨識蛋白質之間是否存在交互作用的方法是在文字探勘領域時常被探討的主題之一,本研究提出一產生交互作用之模式(pattern)的模組以獲取常見的蛋白質交互作用規則,先前亦曾用於參加2015年BioCreative之競賽。本研究亦提出結合蛋白質交互作用規則和卷積樹核(Convolution Tree Kernel)的interaction pattern tree kernel以辨識蛋白質交互作用,而interaction pattern tree透過branching、pruning和ornamenting三個步驟將語法和語意的資訊結合至樹狀結構之中。本研究所提出的方法以LLL, IEPA, HPRD50, AIMed和BioInfer作為資料庫,並透過交叉驗證(cross-validation)、交叉學習(cross-learning)和跨語料庫(cross-corpus)的方式評估效能,實驗結果顯示本研究的方法有效且較數個知名的蛋白質交互作用抽取方法為佳。除此之外,本研究亦探討了數種有效的特徵(features)及建議的研究方向,或可供未來研究參考。
In recent years, the amount of biomedical literatures grows rapidly and thus the need for automated relation extraction methods becomes critical. Among all types of relations, knowledge about protein–protein interactions, including information concerning various aspects of the structural and functional organization of cells, can shed light on molecular mechanisms of biological processes. Therefore, identifying the interactions between proteins mentioned in biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper we propose PIPE, an interaction pattern generation module used in BioCreative 2015 competition to capture frequent protein-protein interaction (PPI) patterns within text. We also present an interaction pattern tree kernel method that integrates the PPI patterns with convolution tree kernel to extract protein-protein interactions, and the interaction pattern tree is constructed through three operations including branching, pruning and ornamenting. The proposed tree structure incorporates syntactic, content, and semantic information in text. Methods were evaluated on LLL, IEPA, HPRD50, AIMed, and BioInfer corpora using cross-validation, cross-learning, and cross-corpus evaluation. Empirical evaluations demonstrate that our method is effective and outperforms several well-known PPI extraction methods. Moreover, we discuss further the features that may be useful for future research.
誌謝 I
中文摘要 II
英文摘要 III
Chapter 1 Introduction 1
Chapter 2 Related Work 4
Chapter 3 Methodology 10
3.1 Candidate Sentence Generation 11
3.2 Learning Interaction Pattern from Biomedical Literature 14
3.3 Interactive Pattern Tree Construction 19
3.4 Convolution Tree Kernel 25
Chapter 4 Experiments 27
4.1 Evaluation Dataset 27
4.2 Experimental Setting and Evaluation Methods 28
4.3 Results and Discussion 30
4.4 Features for Future Research 38
Chapter 5 Concluding Remarks 42
References 43
1 A.Airola, S.Pyysalo, J.Björne, T.Pahikkala,F.Ginter, and Salakoski T. All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics,vol.9: S2, 2008.
2 A. Moschitti. A study on convolution kernels for shallow semantic parsing. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 21-26, 2004.
3 A. Moschitti. Efficient convolution kernels for dependency and constituent syntactic trees. In Proceedings of the 17th European Conference on Machine Learning, pp. 318-329, 2006.
4 C. Cooper and A. M. Frieze. The cover time of random regular graphs. SIAM Journal on Discrete Mathematics, vol. 18, pp. 728-740, 2005.
5 C.D. Manning and H. Schütze. Foundations of statistical natural language processing: MIT Press, Cambridge, Massachusetts, 1stedn., 1999.
6 C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. pp. 401-408, 2006.
7 C. Nedellec. Learning language in logic-genic interaction extraction challenge. In Proceedings of the Learning Language in Logic 2005 Workshop at the International Conference on Machine Learning, pp. 97-99, 2005.
8 D.C. Comeau, R. Islamaj Dogan, P. Ciccarese, K.B. Cohen, M. Krallinger, F. Leitner, Z. Lu, Y. Peng, F. Rinaldi, M. Torii, A. Valencia, K. Verspoor, T.C. Wiegers, C.H. Wu, and W.J. Wilbur. BioC: A Minimalist Approach to Interoperability for Biomedical Text Processing. Database, 2013: doi: 10.1093/database/bat064.
9 D. Hanisch, K. Fundel, H.T. Mevissen, R. Zimmer, and J. Fluck. Prominer: rule-based protein and gene entity recognition. BMC Bioinformatics, vol.6: S14, 2005.
10 D. Tikk, P. Thomas, P. Palaga, J. Hakenberg, and U. Leser. A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature. PLoS Computational Biology, vol. 6, issue 7, pp.1-19, 2010.
11 E.M. Phizicky and S. Fields. Protein-protein interactions: Methods for detection and analysis. Microbiol Rev, vol. 59, pp. 94-123, 1995.
12 G. Erkan, A. Özgür, and D. R. Radev. Semi-supervised classification for extracting protein interaction sentences using dependency parsing. In Proceedings of the 2007 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 228-237, 2007.
13 I. Xenarios, E. Fernandez, L. Salwinski, X.J. Duan, M.J. Thompson, E.M. Marcotte, and D. Eisenberg. DIP: The database of interacting proteins: 2001 update. Nucleic Acids Research, vol. 29, issue 1, pp. 239 - 241, 2001.
14 J.D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii. Overview of BioNLP'09 shared task on event extraction, In Proceeding of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, pp. 1-9, 2009.
15 J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 3rd edn., 2011.
16 K. Fundel, R. Ku¨ ffner, and R. Zimmer. RelEx - relation extraction using dependency parse trees. Bioinformatics, issue 23, pp. 365-371, 2007.
17 K.P. Kamune and A. Avinash. Hybrid Approach to Pronominal Anaphora Resolution in English Newspaper Text. International Journal of Intelligent Systems and Applications,7(2):56, 2015.
18 L. Li, R. Guo, Z. Jiang and D. Huang. An Approach to Improve Kernel-Based Protein Protein Interaction Extraction by Learning from Large-Scale Network Data. Methods, 2015.
19 L. Lovász. Random walks on graphs: a survey. Janos Bolyai Mathematical Society, Budapest 2, pp. 1-46, 1993.
20 L. Qian and G. Zhou. Tree kernel-based protein–protein interaction extraction from biomedical literature. Journal of Biomedical Informatics, vol. 45, pp. 535-543, 2012.
21 L.S. Van, Y. Saeys, B. Baets, and Y.V. Peer. Extracting protein-protein interactions from text using rich feature vectors and feature selection. In Proceedings of 3rd International Symposium on Semantic Mining in Biomedicine, pp. 77-84, 2008.
22 M. Collins and N. Duffy. Convolution kernels for natural language. In Proceedings of Annual Conference on Neural Information Processing Systems, pp. 625-632, 2001.
23 M.F. Porter. An algorithm for suffix stripping, in Readings in Information Retrieval, Karen Sparck Jones and Peter Willet (ed), San Francisco: Morgan Kaufmann, 1997.
24 M. Marneffe, B.MacCartney and C.D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006.
25 M. Miwaa, R. Sætre, Y. Miyao, and J. Tsujii, Protein–protein interaction extraction by leveraging multiple kernels and parsers. International Journal of Medical Informatics, vol. 78, issue 12, pp. 39-46, 2009.
26 M. Zhang, G.D. Zhou, and A.T. Aw. Exploring syntactic structured features over parse trees for relation extraction using kernel methods. Information Processing and Management, vol.44, pp. 687-701, 2008.
27 M. Zhang, J. Zhang, J. Su, and G.D. Zhou. A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 825-832, 2006.
28 N. Cristianini and J.S. Taylor. An introduction to support vector machines and other kernel-based learning methods. New York, USA: Cambridge University Press; 2000.
29 R. Kabiljo, A. Clegg, and A. Shepherd. A realistic assessment of methods for extracting gene/protein interactions from free text. BMC Bioinformatics, vol. 10, pp. 233-245, 2009.
30 R.C.Bunescu, R. Ge, R.J. Kate, E.M.Marcotte, R.J. Mooney, A.K.Ramani, and Y.W. Wong. Comparative experiments on learning information extractors for proteins and their interactions. Artificial Intelligence in Medicine, vol. 33, issue 2, pp. 39-55, 2005.
31 R. Satre, K. Sagae, and J.Tsujii. Syntactic features for protein-protein interaction extraction. In Proceedings of the 2nd international symposium on languages in biology and medicine, pp. 6.1-6.14, 2007.
32 S. Kim, R. Islamaj Dogan, A. Chatr-aryamontri, M. Tyers, W.J. Wilbur, and D.C. Comeau. BioCreative V BioC Track Overview: Collaborative Biocurator Assistant Task for BioGRID, Database, 2016.
33 S. Pyysalo, A. Airola, J. Heimonen, J. Björne, F. Ginter, and T.Salakoski. Comparative Analysis of Five Protein-protein Interaction Corpora. BMC Bioinformatics, vol. 9: S6, 2008.
34 S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, T. Salakoski. A corpus for information extraction in the biomedical domain. BMC Bioinformatics, vol. 8, issue 50, pp. 50-74, 2007.
35 S.R. Jonnalagadda, D. Li, S. Sohn, S.T. Wu, K. Wagholikar, M. Torii, and H. Liu. Coreference Analysis in Clinical Notes: A Multi-pass Sieve with Alternate Anaphora Resolution Modules. Journal of the American Medical Informatics Association,19(5):867-874, 2012.
36 S.V.N. VishwanathanandA.J. Smola. Fast kernels for string and tree matching. In Proceedings of Neural Information Processing Systems, pp. 569-576, 2002.
37 T. Kuboyama, K. Hirata, H. Kashima, K.F. Aoki-Kinoshita, and H. Yasuda. A spectrum tree kernel. Information and Media Technologies, vol. 2, pp.292-299, 2007.
38 Y. López, K. Nakai, and A. Patil. HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species. Database, vol. 2015, 2015.
39 Z. Yang, N. Tang, X. Zhang, H. Lin, Y. Li, and Z. Yang. Multiple kernel learning in protein-protein interaction extraction from biomedical literature. Artificial Intelligence in Medicine, vol. 51, issue 3, pp. 163-73, 2011.
40 T. Mikolov, K. Chen, G. Corrado and J. Dean. Efficient estimation of word representations in vector space. In Proceeding of International Conference on Learning Representations, 2013.
41 C. Ma, Y. Zhang, and M. Zhang. Tree Kernel-based protein-protein interaction extraction considering both governor verb phrases and appositive dependency features. In Proceedings of the 24th International Conference on World Wide Web Companion, pp. 655-660, 2015.
42 Katrin Fundel, Robert Kuffner, and Ralf Zimmer. RelEx–Relation extraction using dependency parse trees. Bioinformatics, 23(3):365–371, 2007.
43 Yun-Nung Chen, Dilek Hakkani-Tur, and Gokan Tur. Deriving local relational surface forms from dependency-based entity embeddings for unsupervised spoken language understanding. In Spoken Language Technology Workshop (SLT), 2014 IEEE, pp. 242–247, 2014.
44 R. Socher, C.D. Manning, and Andrew Y. Ng. Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks. Deep Learning and Unsupervised Feature Learning Workshop – NIPS, 2010.
45 J. F. Gao, X. D He, W. T. Yih, and L. Deng. Learning Continuous Phrase Representations for Translation Modeling. In Proceedings of ACL, 2014.
46 Y. Xu, L. Mou, G. Li, Y. Chen, H. Peng, and Z. Jin. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 1785–1794, 201555.
47 K. Sugiyama, K. Hatano, M. Yoshikawa, and S. Uemura. Extracting information on protein-protein interactions from biological literature based on machine learning approaches. Genome Informatics, vol. 14, pp. 699-700, 2003.
48 T. Mitsumori, M. Murata, Y. Fukuda, K. Doi, and H. Doi. Extracting protein-protein interaction information from biomedical text with SVM. IEICE Transactions on Information and Systems, vol. E89-D (8), pp. 2464-2466, 2006.
49 B. Liu, L. H. Qian, H. L. Wang, and G. D. Zhou. Dependency-driven feature-based learning for extracting protein–protein interactions from biomedical Text. In Proceedings of COLING’2010 (Poster), pp. 757-65, 2010.
50 D. McClosky, S. Riedel, M. Surdeanu, A. McCallum, and C. D. Manning. Combining joint models for biomedical event extraction. BMC bioinformatics, vol. 13, no. Suppl 11, S9, 2012.
51 A. Vlachos and M. Craven. Biomedical event extraction from abstracts and full papers using search-based structured prediction. BMC bioinformatics, vol. 13, no. Suppl 11, S5, 2012.
52 Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research 3, pp. 1137-1155, 2003.
53 L. Qiu, Y. Cao, Z. Nie, and Y. Rui. Learning word representation considering proximity and ambiguity. In Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.
54 E. Asgari, and M. R. K. Mofrad. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLOS ONE 10, 11, 2015.
55 T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In: Advances in neural information processing systems, pp. 3111-3119, 2013.
56 S. Albert, S. Gaudan, H. Knigge, A. Raetsch, A. Delgado, B. Huhse, H. Kirsch, M. Albers, D. Rebholz-Schuhmann, M. Koegl. Computer-assisted generation of a protein-interaction database for nuclear receptors. Mol Endocrinol. 17(8): 1555-1567, 2003.
57 M. Huang, X. Zhu, and Y. Hao. Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics 20, 3604-3612, 2004.
58 R. Bunescu and R. Mooney. A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 724-731, 2005.
59 C. Li, R. Song, M. Liakata, A. Vlachos, S. Seneff, X. Zhang. Using word embedding for bio-event extraction. In Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015), pp. 121-126, 2015.
60 R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12 (2011), pp. 2493-2537.
61 V.N. Vapnik. The nature of statistical learning theory. Springer-Verlag, 1995.
62 G. E. Hinton. Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society, pp. 1-12, 1986.
63 J. L. Elman. Distributed representations, simple recurrent networks, and grammatical structure. Machine learning, 7(2-3):195–225, 1991.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top