跳到主要內容

臺灣博碩士論文加值系統

(44.222.218.145) 您好!臺灣時間:2024/02/29 13:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張志豪
研究生(外文):Chih-Hao Chang
論文名稱:基於項目集樹樣式探勘之蛋白質功能註解系統
論文名稱(外文):Gene Ontology Supported Protein Function Annotation via Item-Set Tree-Based Pattern Mining
指導教授:謝孫源蔣榮先蔣榮先引用關係
指導教授(外文):Sun-Yuan HsiehJung-Hsien Chiang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:48
中文關鍵詞:功能註解文件探勘資訊萃取樣式探勘生物資訊學
外文關鍵詞:Text MiningPattern MiningInformation ExtractionBioinformaticsFunction Annotation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:188
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:4
  隨著生物醫學文獻的快速增加,資訊萃取(Information Extraction)在生物資訊學(Bioinformatics)中正迅速的變成一個不可缺少的技術。而其中以樣式(Pattern)比對為基礎的資訊萃取方法更能夠快速、準確的從文件中萃取出有意義的資訊及知識。
  在本論文中,我們提出一個以項目集樹(Item-Set Tree)為基礎之樣式探勘(Pattern Mining)的方法,來取代一般以樣式為基礎的資訊萃取方法中所需要的、耗時費力的手動訂立樣式的過程,並以此發展出一套從生物醫學文獻中自動辨識蛋白質相關資訊—生物程序(Biological Process)、細胞成分(Cellular Component)、分子功能(Molecular Function)—的系統,並且能產生相對應的、以基因本體論(Gene Ontology;GO)為基礎的註解。
  本論文以樣式探勘方法來尋找樣式之基本觀念是,此方法可以將描述方式類似的句子中,主要特徵的部分篩選出來;換言之,將次要的部分濾除,例如只為加強語氣的字眼,再利用這些特徵從文件中找出描述蛋白質功能的句子。但是一般探勘方法卻會有忽略低頻率特徵的的缺點。針對此點,本論文使用以項目集樹為基礎的探勘方法克服之,對於低頻卻可做為特徵的項目,此方法依然可以正確地篩選出來。
  With the accelerative availability of biological literatures, research on information extraction is rapidly becoming an essential component of various bioinformatics applications. The pattern-based information extraction approach can extract information and knowledge from documents quickly and accurately.
  In this thesis, we propose a pattern mining methodology based on the item-set tree to substitute the time-consuming manual pattern establishment by experts in pattern-based information extraction approach. We implement an automatic function annotation system, which can extract gene and/or protein information about biological process, cellular component, and molecular function, and produce Gene Ontology-based annotations.
  The proposed method can cluster similar sentences and extract the main descriptive components. In general data mining approach, infrequent patterns will be ignored, but the proposed method can overcome this drawback.
中文摘要 i
Abstract ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 系統概述 2
1.4 章節概要 3
第二章 文獻回顧 4
2.1 生物資訊學 4
2.2 資料探勘 4
2.3 資訊萃取 5
2.4 國際會議及競賽 6
2.4.1 KDD Cup 2002 6
2.4.2 BioCreAtIvE 2003 7
2.5 相關系統 7
2.5.1 Gene Ontology 7
2.5.2 PubMed 8
2.5.3 MedMiner 10
2.5.4 BioBiblioMetrics 11
第三章 蛋白質功能註解系統 13
3.1 系統架構 13
3.2 名稱型態變化 16
3.3 樣式探勘 17
3.4 功能註解萃取 23
第四章 實驗設計與分析 26
4.1 實驗設計 26
4.2 實驗資料 27
4.2.1 實驗文件前處理 28
4.2.2 訓練樣本的產生 29
4.3 實驗結果與分析 30
第五章 結論與未來研究方向 33
5.1 結論 33
5.2 未來研究方向 33
參考文獻 35
附錄A - 系統展示 40
自述 48
[1]M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, “Gene Ontology: Tool for the Unification of Biology”, Nature Genetics, vol. 25, pp. 25-29, 2000. http://www.geneontology.org/
[2]LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/
[3]M. Kubat, A. Hafez, V.V. Raghavan, J.R. Lekkala, and W.K. Chen, “Itemset Trees for Targeted Association Querying”, IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 6, pp. 1522-1534, November/December 2003.
[4]C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, “GENIES: A Natural-Language Processing System for the Extraction of Molecular Pathways from Journal Articles”, Bioinformatics, vol. 17, Suppl. 1, pp. S74-S82, 2001.
[5]D.-M. Yao, J.-B. Wang, Y.-M. Lu, N. Noble, H.-D. Sun, X.-Y. Zhu, N. Lin, D.G. Payan, M. Li, and K.-B. Qu, “PathwayFinder: Paving the Way Towards Automatic Pathway Extraction”, Proc. Second Asia-Pacific Bioinformatics Conf. (APBC2004), pp. 53-62, 2004.
[6]J.T. Chang, S. Raychaudhuri, and R.B. Altman, “Including Biological Literature Improves Homology Search”, Proc. Pacific Symp. on Biocomputing (PSB2001), pp. 374-383, 2001.
[7]J.-H. Chiang and H.-C. Yu, “MeKE: Discovering the Functions of Gene Products from Biomedical Literature via Sentence Alignment”, Bioinformatics, vol. 19, no. 11, pp. 1417-1422, 2003.
[8]J.-H. Chiang, H.-C. Yu, and H.-J. Hsu, “GIS: A Biomedical Text-Mining System for Gene Information Discovery”, Bioinformatics, vol. 20, no. 1, pp. 120-121, 2004.
[9]S. Ray and M. Craven, “Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text”, Proc. Workshop Critical Assessment for Information Extraction in Biology, 2004.
[10]B.J. Stapley, L.A. Kelley, and M.J.E. Sternberg, “Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines”, Proc. Pacific Symp. on Biocomputing (PSB2002), pp. 374-385, 2002.
[11]N. Daraselia, A. Yuryev, S. Egorov, S. Novichkova, A. Nikitin, and I. Mazo, “Extracting Human Protein Interactions from MEDLINE Using a Full-Sentence Parser”, Bioinformatics, vol. 20, no. 5, pp. 604-611, 2004.
[12]E.M. Marcotte, I. Xenarios, and D. Eisenberg, “Mining Literature for Protein-Protein Interactions”, Bioinformatics, vol. 17, no. 4, pp. 359-363, 2001.
[13]T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi, “Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature”, Bioinformatics, vol. 17, no. 2, pp. 155-161, 2001.
[14]J.M. Temkin and M.R. Gilder, “Extraction of Protein Interaction Information from Unstructured Text Using a Context-Free Grammar”, Bioinformatics, vol. 19, no. 16, pp. 2046-2053, 2003.
[15]L. Wong, “PIES, A Protein Interaction Extraction System”, Proc. Pacific Symp. on Biocomputing (PSB2001), pp. 520-531, 2001.
[16]L. Tanabe, U. Scherf, L. H. Smith, J. K. Lee, L. Hunter and J. N. Weinstein, "MedMiner: An Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression Profiling", BioTechniques, vol. 27, pp.1210-1217. http://discover.nci.nih.gov/textmining/main.jsp
[17]L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H. Wu, “Accomplishments and Challenges in Literature Data Mining for Biology”, Bioinformatics, vol. 18, no. 12, pp. 1553-1561, 2002.
[18]R. Agrawal, T. Imilienski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases”, Proc. 1993 ACM SIGMOD International Conf. on Management of Data, pp. 207-216, May 1993.
[19]I.N. Kouris, C.H. Makris, and A.K. Tsakalidis, “An Improved Algorithm for Mining Association Rules Using Multiple Support Values”, Proc. 16th International Florida Artificial Intelligence Research Symposium Conf. (FLAIRS2003), pp. 309-313, 2003.
[20]B. Liu, W. Hsu, and Y. Ma, “Mining Association Rules with Multiple Minimum Supports”, Proc. 5th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pp. 337-341, August 15-18, 1999.
[21]R. Feldman, Y. Aumann, M. Finkelstein-Landau, E. Hurvitz, Y. Regev, and A. Yaroshevich, “A Comparative Study of Information Extraction Strategies”, Proc. 3rd International Conf. on Computational Linguistics and Intelligent Text Processing, pp. 349-359, February 17-23, 2002.
[22]R. G.rishman, “Information Extraction: Techniques and Challenges”, Springer-Verlag, Lecture Notes in Artificial Intelligence, 1997.
[23]A. Yeh, L. Hirschman, and A. Morgan, “Background and Overview for KDD Cup 2002 Task 1: Information Extraction from Biomedical Articles”, SIGKDD Explorations, vol. 4, no. 2, pp. 87-89, 2002.
[24]J.A. Blake, J.E. Richardson, C.J. Bult, J.A. Kadin, J.T. Eppig, and the members of the Mouse Genome Database Group. “MGD: The Mouse Genome Database”, Nucleic Acids Research, vol. 31, no. 1, pp. 193-195, 2003. http://www.informatics.jax.org/
[25]The FlyBase Consortium, “The FlyBase Database of the Drosophila Genome Projects and Community Literature”, Nucleic Acids Research, vol. 31, no. 1, pp. 172-175, 2003. http://flybase.org/
[26]PubMed: http://www.ncbi.nlm.nih.gov/entrez/
[27]M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet, “GeneCards: Encyclopedia for Genes, Proteins and Diseases”, Weizmann Institute of Science, Bioinformatics Unit and Genome Center (Rehovot, Israel), 1997. http://bioinformatics.weizmann.ac.il/cards/
[28]B.J. Stapley and G. Benoit, “Biobibliometrics: Information Retrieval and Visualization from Co-Occurrences of Gene Names in Medline Abstracts”, Proc. Pacific Symp. on Biocomputing (PSB2000), pp. 526-537, 2000. http://www.bmm.icnet.uk/~stapleyb/biobib/
[29]F.M. Couto, M.J. Silva, and P. Coutinho, “FiGO: Finding GO Terms in Unstructured Text”, Proc. Workshop Critical Assessment for Information Extraction in Biology, 2004.
[30]EBI: http://www.ebi.ac.uk/Information/index.html
[31]K. Verspoor, J. Cohn, C. Joslyn, S. Mniszewski, A. Rechtsteiner, L.M. Rocha, and T. Simas, “Protein Annotation as Term Categorization in the Gene Ontology”, Proc. Workshop Critical Assessment for Information Extraction in Biology, 2004.
[32]B. Boeckmann, A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider, "The SWISS-PROT Protein Knowledgebase and its Supplement TrEMBL in 2003", Nucleic Acids Research, vol. 31, no. 1, pp. 365-370, 2003. http://us.expasy.org/sprot/
[33]Journal of Biological Chemistry: http://www.jbc.org/
[34]H. Schmid, “Probabilistic Part-of-Speech Tagging Using Decision Trees”, In International Conference on New Methods in Language Processing. 1994. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html
[35]E. Camon, M. Magrane, D. Barrell, V. Lee, E. Dimmer, J. Maslen, D. Binns, N. Harte, R. Lopez, and R. Apweiler, “The Gene Ontology Annotation (GOA) Database: Sharing Knowledge in Uniprot with Gene Ontology”, Nucleic Acids Research, vol. 32, Database issue, pp. D262–D266, 2004. http://www.ebi.ac.uk/GOA/
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 15、張心馨、詹進勝(2000),「全球運籌與供應鏈管理在台灣企業國際化整合發展之研究」,《經濟情勢暨評論》,第六卷第一期,6月,頁8。
2. 18、陳永欽(2003),〈從研發角度看臺灣紡織業的國際競爭力〉,《紡織月刊》,第89期,11月,頁8-29。
3. 16、張憲正(2002),〈紡織業運籌管理的時代趨勢〉,《絲織園地》,第41期,7月,頁70-76。
4. 10、林建山(2001),〈傳統產業再造策略抉擇-增進我國紡織業競爭力之新思維〉,《絲織園地》,第36期,4月,頁60-63。
5. 13、孫文秀、江漢傑、趙時進(2003),「以產業e化,佈局全球,生根台灣,打造亞太營運中心」,《技術尖兵》,第107期,11月,頁17-19。
6. 7、吳道昌(2003),〈配額取消後成衣業之新思維及因應策略〉,《紡織月刊》,第90期,12月,頁7-13。
7. 24、趙鑑邦(2001),「加入WTO對大陸紡織工業的挑戰與兩岸合作契機」,《絲織園地》,第35期,頁23-25。
8. 35、趙郁文(1998),「跨國委託製造對臺灣資訊電子廠商營運能力之提升效果」,《中山管理評論》,第6卷第4期,頁1113-1135。
9. 5、王淑卿(1998),「東亞貿易情勢演變與金融風暴影響評估」,《經濟情勢暨評論》第3卷第4期,2月,頁84-106。
10. 1、丁又培(2003),〈如何結合台灣上中下游紡織業利用品牌共同拓銷大陸市場與台商業者面臨的挑戰〉,《紡織月刊》,第90期,12月,頁14-19。
11. 32、蔡淑梨(2004),〈台灣紡織成衣業全球運籌動機與據點佈局相關研究〉,《紡織中心期刊》,第14卷第1期,頁1-21。
12. 30、蔡宏明(2000),「全球化、數位化與快速化時代的全球運籌管理策略」,《經濟情勢暨評論》,第6卷第1期,6月,頁32-58。
13. 29、蔡吉源(2003),〈三通後對台灣紡織業的影響與因應對策〉,《華人經濟研究》,第1卷第2期,佛光人文社會學院,9月,頁122-166,台北。
14. 26、劉介正(2002),〈高科技紡織轉型新思維〉,《絲織園地》,第40期,4月,頁41-49。
15. 25、劉介正(2002),〈台灣因應中國大陸紡織業發展的策略系列報導之四-由行銷、轉型及策略聯盟方向論述〉,《絲織園地》,第42期,10月,頁10-22。