(34.239.176.198) 您好!臺灣時間:2021/04/23 19:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:游志銘
研究生(外文):Chih-Ming Yu
論文名稱:從生物文獻中萃取物件之關係
論文名稱(外文):Relation Extraction from Biological Literature
指導教授:梁婷梁婷引用關係
指導教授(外文):Tyne Liang
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:64
中文關鍵詞:資訊萃取蛋白質疾病
外文關鍵詞:Information ExtractionProteinDisease
相關次數:
  • 被引用被引用:0
  • 點閱點閱:116
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在本論文中,提出一個資訊擷取系統應用在生物文獻上,以萃取生物關係並將之轉換成樣板型式。在資訊萃取部份,實做加權重的貝式分類器來將句子分成有關係、沒關係和模糊不清三種類別。同時使用詞典和名詞片語擷取器來挑選生物事件之候選的名詞片語。資訊萃取的核心技術在於所提出的經驗法則,可以結合連接詞以處理多重關係的抽取。本系統可以說是一個以經驗法則為基礎的系統。在關係抽取的實驗中,召回率為79.30%,而正確率85.61%。此外,我們額外收集100個句子來作關係抽取實驗,平均召回率為78.71%,而平均正確率為83%。

In this thesis, an information extraction (IE) system, PADIES, for biological literature is designed to extract biological relations and transfer them into templates. In the IE module, we implemented a weighted Naive Bayes classifier, so as to classify sentences into three classes: Yes, No, Ambiguous. We also use lexicons and a noun phrase (NP) chunker to extract noun phrases as arguments of events. The IE kernel part of the proposed system is based on a set of heuristic rules combined with conjunctives which can deal with multiple relation extraction. PADIES is a kind of rule-based IE system. In relation extraction experiment, our recall is 79.30% and precision is 85.61%. Besides, we collect 100 sentences to do relation extraction experiment. The average recall is 78.71% and the average precision is 83%.

Abstract (in Chinese)
Abstract (in English)
Acknowledgements
Contents
List of Tables
List of Figures
Chapter 1 Introduction
1.1 Background
1.2 Goal
1.3 Overview of System Architecture
1.4 The Synopsis
Chapter 2 Related Works
2.1 Corpus Introductions
2.1.1 PIR
2.1.2 Swiss-Prot
2.1.3 GENIA Corpus and Bio1
2.1.4 MEDLINE
2.2 Biological Information Extraction Methods
2.2.1 Modified General Purple IE System
2.2.2 Statistical Methods
2.2.3 Relational Methods
2.2.4 Rule-Based Methods
Chapter 3 Resource Preprocessing
3.1 Corpus Preparation
3.2 Lexicon Collection
Chapter 4 Proposed Information Extraction Method
4.1 Proposed IE Method Architecture
4.2 Sentence Classification
4.2.1 Architecture of Sentence Classification
4.2.2 Naive Bayes Classifier
4.2.3 Weighted Naive Bayes Classifier
4.2.4 Pearson’s Chi-Square Test
4.2.5 Weighted Pearson’s Chi-Square Test
4.2.6 Sentence Classificaiotn Conclusion
4.3 Relation Instance Extractor
Chapter 5 Relation Extraction and Experiment
5.1 Relation Extraction
5.1.1 Simple Relation Extraction
5.1.2 Multiple Verbs
5.1.3 Conjunction Cases
5.2 Experiment
5.3 Analysis
5.3.1 Wrong Tagging and NP Chunking
5.3.2 Anaphora Handling
5.3.3 Semantic Understanding
5.3.4 Syntactic Complexity
Chapter 6 Future Work and Conclusion
6.1 Future Work
6.2 Conclusion
Reference
Appendix A: Heuristic Rules
Appendix B: Value Distribution of Sentence Classification Method
Appendix C: Top 50 Feature Words in Three Classes

[1] Apweiler, R. (1999) “Introduction to Molecular Biology Databases”, In: The EBI Online Manual on Molecular Biology Databases, Apweiler R., Lopez R., Marx B.
[2] Apweiler, R. (2001) “Functional information in SWISS-PROT: the basis for large-scale characterisation of protein sequences”, Briefings in Bioinformatics, Vol. 2, pp. 9-18.
[3] Apweiler, R., Gateau, A., Contrino, S., Martin, M. J., Junker, V., O’Donovan, C., Lang, F., Mitaritonna, N., Kappus, S. and Bairoch, A. (1997) “Protein Sequence Annotation in the Genome Era: The Annotation Concept of SWISSPROT+ TREMBL” In Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology (ISMB 97), American Association for Artificial Intelligence (AAAI Press), pp. 33-43.
[4] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Bradley, N. (1997) The Concise Companion, New York, Addison Wesley Longman.
[5] Kasarskis, A., Lewis, S.,Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) “Gene ontology: tool for the unification of biology”, The Gene Ontology Consortium. Nat. Genet, Vol. 25, pp. 9-25.
[6] Bairoch, A. (1996) “The ENZYME data bank in 1995”, Nucleic Acids Research, Vol. 24, pp. 221-222.
[7] Bairoch, A. and Apweiler, R. (2000) “The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000”, Nucleic Acids Research, Vol. 28, pp. 45—48.
[8] Baker, P. G., Goble, C. A.,Bechhofer, S. and Paton, N. W., Stevens, R. and Brass, A. (1999) “An ontology for bioinformatics applications”, Bioinformatics, Vol. 15, pp. 510-20.
[9] Barker, W. C., Garavelli, J. S., McGarvey, P. B., Marzec, C. R., Orcutt, B. C., Srinivasarao, G. Y., Yeh, L.-S. L., Ledley, R. S., Mewes, H.-W., Pfeiffer, F. and Tsugita, A. (1999), Nucleic Acids Research, Vol. 27, pp. 39-43.
[10] Blaschke, C., Andrade M. A., Ouzounis, C. and Valencia A. (1999) “Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions”, In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB99), American Association for Artificial Intelligence (AAAI Press), pp. 60-67.
[11] Cardie, C. (1997) “Empirical Methods in Information Extraction”, AI Magazine, Vol. 18(4), pp. 65-79.
[12] Collier, N., Nobata, C. and Tsujii, J. (2000) “Extracting the names of genes and gene products with a Hidden Markov Model”, In Proceeding COLING 2000, pp. 201-207.
[13] Craven, M. and Kumlien, J. (1999) “Constructing Biological Knowledge Bases by Extracting Information from Text Sources”, In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB99), American Association for Artificial Intelligence (AAAI Press), pp. 77-86.
[14] Cowie, J. and Lehnert, W. (1996) “Information Extraction”, Communication of the ACM, 39(1), pp. 80-91.
[15] Defense Advanced Research Projects Agency (DARPA) (1998), Proceedings of the Seventh Message Understanding Conference (MUC-7), http://www.saic.com .
[16] Fleischmann, W., M‥oller, S., Gateau, A. and Apweiler, R. (1999) “A Novel Method for Automatic Functional Annotation of Proreins”, Bioinformatics, Vol. 15, pp. 228-233.
[17] Friedman, C., Kra, P., Yu, H., Krauthammer, M. and Rzhetsky, A., (2001) “GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles”, Bioinformatics Vol. 17, Suppl. 1, pp. S74-S82.
[18] Fukuda, K., Tsunoda, T., Tamura, A. and Takagi, T. (1998) “Toward Information Extraction: Identifying Protein Names from Biological Papers”, In Proceeding of Pacific Symposium on Biocomputing (PSB1998) Vol. 3, pp. 705-716.
[19] Gaizauskas, R., Demetriou, G., Artymiuk, P. J. and Willett, P. (2003) “Protein Structures and Information Extraction from Biological Texts: The PASTA System”, Bioinformatics Vol. 19, No. 1, pp. 135-143.
[20] Hahn, U., Romacker, M. and Schulz, S. (2002) “Creating Knowledge Repositories from Biomedical Reports: The MEDSYNDIKATE Text Mining System”, In Proceedings of the Pacific Symposium on Biocomputing (PSB 2002), pp. 338-349.
[21] Hodges, P. E., Payne, W. E. and Garrels, J. I. (1998) “Yeast protein database (YPD): A database for the complete proteome of saccharomyces cerevisiae”, Nucleic Acids Research, Vol. 26, pp. 68-72.
[22] Hishiki, T., Collier, N., Nobata, C., Okazaki-Ohta, T., Ogata, N., Sekimizu, T., Steiner, R., Park Hyun, S. and Tsujii, J. (1998) “Developing NLP Tools for Genome Informatics: An Information Extraction Perspective”, The Ninth Workshop on Genome Informatics (GIW '98), pp.81-90.
[23] Humphreys, B. L., Lindberg, D. A., Schoolman, H. M. and Barnett, G. O. (1998) “The Unified Medical Language System: an informatics research collaboration”, J. Am. Med. Inform. Association, Vol. 5, pp. 1-11.
[24] Humphreys, K., Demetriou, G. and Gaizauskas, R., (2000) “Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures”, In Proceeding of Pacific Symposium on Biocomputing (PSB2000), Vol. 5, pp. 502-513.
[25] Iliopoulos, I., Enright, A. J. and Ouzounis, C. A. (2001) “TEXTQUEST: Document Clustering of MEDLINE Abstracts for Concept Discovery in Molecular Biology”, In Proceeding of Pacific Symposium on Biocomputing (PSB2001) , pp. 384-395.
[26] Kanehisa, M., and Goto, S. (2000) “KEGG: kyoto encyclopedia of genes and genomes”, Nucleic Acids Res., Vol. 28, pp. 27-30.
[27] Kazama, J., Makino, T., Ohta, Y. and Tsujii, J. (2002) “Tuning Support Vector Machines for Biomedical Named Entity Recognition”, In Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain, pp. 1-8.
[28] Leroy, G. and Chen, H. (2002) “Filling Preposition-Based Templates to Capture Information from Medical Abstracts ”, In Proceedings of Pacific Symposium on Biocomputing (PSB2002), 7, PP. 350-361.
[29] Lin, D. (1998) “Dependency-Based Evaluation of MINIPAR”, In Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation.
[30] M‥oller, S., Leser, U., Fleischmann, W. and Apweiler, R. (1999) “EDITtoTrEMBL: a distributed approach to high-quality automated protein sequence annotation” Bioinformatics, Vol. 15, pp. 219-227.
[31] Mitchell, T. M. (1997) Machine Learning, New-York: MaGraw-Hill.
[32] Nobata, C., Collier, N. and Tsujii, J. (1999) “Automatic Term Identification and Claasification in Biology Texts”, In Proc. Natural Language Pacific Rim Symposium, pp. 369-374.
[33] Ohta, Y., Yamamoto, Y., Okazaki, T., Uchiyama, I. and Takagi, T. (1997) “Automatic Knowledge Base Construction from Biological Papers ”, In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology (ISMB97), American Association for Artificial Intelligence (AAAI Press), pp. 218-225.
[34] Ono, T., Hishigaki, H., Tanigami, A. and Takagi, T., (2001) “Automated Extraction of Information on Protein-Protein Interaction from The Biological Literature”, Bioinformatics, Vol. 17, No. 2, pp. 155-161.
[35] Park, J. C., Kim, H. S. and Kim, J. J. (2001) “Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar”, In Proceedings of Pacific Symposium on Biocomputing (PSB2001), Vol. 6, pp. 396-407.
[36] Porter, M.F. (1980) “A algorithm for suffix stripping”, Program, Vol. 14(3), pp. 127-130
[37] Pustejovsky, J., Castano, J., Zhang, J., Kotecki, M. and Cochran, B. ( 2002) “Robust Relational Parsing over Biomedical Literature: Extracting Inhibit Relations”, In Proceedings of Pacific Symposium on Biocomputing (PSB2002) , pp. 362-373.
[38] Rindflesch, T. C., Rajan, J. V. and Hunter, L. (2000) “Extracting Molecular Binding Relationships from Biomedical Text”, In Proceedings of the ANLP-NAACL 000, Association for Computational Linguistics, pp. 188-195.
[39] Rindflesch, T. C., Tanabe, L., Weinstein, J. N. and Hunter, L. (2000) “EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature”, In Proceedings of Pacific Symposium on Biocomputing(PSB2000), 5, pp. 514-525.
[40] Sekimizu, T., Park, H. S. and Tsujii, J. (1998)“Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts”, The Ninth Workshop on Genome Informatics Workshop (GIW '98), pp. 62-71.
[41] Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R. and Mostafa, J. (2001) “Detecting Gene Relations from MEDLINE Abstracts”, In Proceedings of Pacific Symposium on Biocomputing (PSB2001) , 6, pp. 483-496.
[42] Tateisi, Y., Torisawa, K., Miyao, Y. and Tsujii, J. (1998) “Translating the XTAG english grammar to HPSG”, In Proceedings of TAG+4 workshop.
[43] Thomas, J., Milward, D., Ouzounis, C., Pulman, S. and Carroll, M. (2000) “Automatic Extraction of Protein Interactions from Scientific Abstracts”, In Proceedings of Pacific Symposium on Biocomputing (PSB2000), Vol. 5, pp. 538-549.
[44] Voutilainen, A. (1996) “Designing a (finite-state) parsing grammar”, In Finite-State Language Processing, E. Roche and Y. Schabes (editors), A Bradford Book, The MIT Press.
[45] Weeber, M. and Vos, R. (1998) “Extracting Expert Knowledge from Medical Texts”, A Workshop at the 13th European Conference on Artificial Intelligence ECAI-98.
[46] Wong, L. (2001) “PIES, A Protein Interaction Extraction System”, In Proceedings of Pacific Symposium on Biocomputing (PSB2001), Vol. 6, pp. 520-531.
[47] Yakushiji, A., Tateisi, Y., Miyao, Y. and Tsujii, J. (2001) “Event Extraction from Biomedical Papers Using a Full Parser”, In Proceedings of Pacific Symposium on Biocomputing (PSB2001), pp. 408-419.
[48] PIR-NREF Release Version 1.21: http://pir.georgetown.edu/pirwww/search/pirnref.shtml
[49] Swiss-Prot: http://www.ebi.ac.uk/swissprot/
[50] GENIA project: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/
[51] Bio1 Data Set: http://research.nii.ac.jp/~collier/
[52] National Center for Biotechnology Information: http://www.ncbi.nlm.nih.gov/
[53] Disease Names List in Karolinska Institute University: http://www.mic.ki.se/Diseases/alphalist.html
[54] PDB At A Glance: http://cmm.info.nih.gov/modeling/pdb_at_a_glance.html
[55] Disease Names List in Medical Object Oriented Software Enterprises Limited: http://www.diseasesdatabase.com/sieve/default.asp
[56] PubMed Stopword List: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html#Stopwords
[57] Porter Stemming Algorithm: http://www.tartarus.org/~martin/PorterStemmer/
[58] Infogistics Limited: http://www.infogistics.com/

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔