跳到主要內容

臺灣博碩士論文加值系統

(35.153.100.128) 您好!臺灣時間:2022/01/22 08:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王世賢
研究生(外文):Shih-Hsien Wang
論文名稱:應用資料探勘技術於蛋白質保留性胺基酸序列之關聯性
論文名稱(外文):Study of Motif Correlation in Proteins by Data Mining
指導教授:洪炯宗洪炯宗引用關係
指導教授(外文):Jorng-Tzong Horng
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:40
中文關鍵詞:蛋白質
外文關鍵詞:proteinmotifmining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:317
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2

蛋白質序列在演化的過程中,有些區域的序列往往較其他區域更容易被保留下來,而這些較保留下來的區域常常是在蛋白質的結構上或者是弁鄐W扮演著相當重要的角色。蛋白質序列上motif之間的關聯性可能隱約透露出蛋白質生物弁鄋爾穈T,而這樣的資訊也提供了我們分析人類基因與其他物種之間演化分析上的一些線索。我們的目的主要是想找出蛋白質序列結構上motif的關聯性,而在這次的研究中蛋白質序列主要是從PIR-NREF 資料庫萃取而來,而motif 則是由PROSITE資料庫取出。我們使用資料探勘的方法來尋找蛋白質序列上motif之間的關聯性。


In protein sequences, some regions are better conserved than others during evolution. These conserved regions generally play an important role in function or structure of proteins. The knowledge of the correlation between protein motifs should be important in shedding new light on the biological functions of proteins and offering a basis in analyzing the evolution in the human genome or other genomes. The aim here is to find the motif correlation in protein structures. The protein sequences used in this study are from PIR-NREF database and PROSITE database, respectively. We apply data mining approach to discover the correlation of motif in protein sequences.


Content
Chapter 1 Introduction1
Chapter 2 Related Work5
2.1 Protein Databases5
2.2 Protein domain family database6
2.3 Protein Structure Related Databases7
2.4 Association rules8
Chapter 3 Our Approach10
3.1 Materials11
3.2 Preprocessing and Mapping11
3.3 Mining Association Rules18
Chapter 4 Results21
4.1 Environments of Implementation21
4.2 Mining Result21
Chapter 5 Discussion29
Chapter 6 Conclusions33
References34
Appendix A37


[1] Laurent Falquet, Marco Pagni, Philipp Bucher, Nicolas Hulo, Christian J. A. Sigrist, Kay Hofmann, and Amos Bairoch “The PROSITE database, its status in 2002”. Nucl. Acids. Res. 2002 30: 235-238.[2] K Hofmann, P Bucher, L Falquet, and A Bairoch. "The PROSITE database, its status in 1999". Nucl. Acids. Res. 1999, 27: 215-219.[3] A Bairoch, P Bucher, and K Hofmann. “The PROSITE database, its status in 1997”. Nucl. Acids. Res. 1997 25: 217-221.[4] A Bairoch, P Bucher, and K Hofmann. “The PROSITE database, its status in 1995”. Nucl. Acids. Res. 1996 24: 189-196.[5] Alex Bateman, Ewan Birney, Richard Durbin, Sean R. Eddy, Kevin L. Howe, and Erik L. L. Sonnhammer . "The Pfam Protein Families Database". Nucl. Acids. Res. 2000, 28: 263-266. [6] T. K. Attwood, M. J. Blythe, D. R. Flower, A. Gaulton, J. E. Mabey, N. Maudling, L. McGregor, A. L. Mitchell, G. Moulton, K. Paine, and P. Scordis. "PRINTS and PRINTS-S shed light on protein ancestry". Nucl. Acids. Res. 2002, 30: 239-241.[7] Amos Bairoch and Rolf Apweiler. "The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000". Nucl. Acids. Res. 2000, 28: 45-48. [8] Loredana Lo Conte, Bart Ailey, Tim J. P. Hubbard, Steven E. Brenner, Alexey G. Murzin, and Cyrus Chothia . "SCOP: a Structural Classification of Proteins database". Nucl. Acids. Res. 2000, 28: 257-259. [9] R. Apweiler, T. K. Attwood, A. Bairoch, A. Bateman, E. Birney, M. Biswas, P. Bucher, L. Cerutti, F. Corpet, M. D. R. Croning, R. Durbin, L. Falquet, W. Fleischmann, J. Gouzy, H. Hermjakob, N. Hulo, I. Jonassen, D. Kahn, A. Kanapin, Y. Karavidopoulou, R. Lopez, B. Marx, N. J. Mulder, T. M. Oinn, M. Pagni, F. Servant, C. J. A. Sigrist, and E. M. Zdobnov. "InterPro-an integrated documentation resource for protein families, domains and functional sites". Bioinformatics. 2000, 16: 1145-1150. [10] A Elofsson and EL Sonnhammer . "A comparison of sequence and structure protein domain families as a basis for structural genomics". Bioinformatics. 1999, 15: 480-500. [11] Ernst Kretschmann, Wolfgang Fleischmann, and Rolf Apweiler. "Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT". Bioinformatics. 2001, 17: 920-926. [12] SR Eddy. "Profile hidden Markov models". Bioinformatics. 1998, 14: 755-763. [13] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Mining association rules between sets of items in large databases", in Proc. of the ACM SIGMOD Conference on Management of Data, 1993 [14] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo, "Finding Interesting Rules from Large Sets of Discovered Association Rules", CIKM, 1994, 401-407.[15] F.C. Tseng and C.C. Hsu, "Generating Frequent Patterns with the Frequent Pattern List", PAKDD 2001[16] S. J. Wheelan, A. Marchler-Bauer, and S. H. Bryant. "Domain size distributions can predict domain boundaries". Bioinformatics. 2000, 16: 613-618.[17] Cathy H. Wu, Hongzhan Huang, Leslie Arminski, Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Kali C. Lewis, Hans-Werner Mewes, Bruce C. Orcutt, Baris E. Suzek, Akira Tsugita, C. R. Vinayaka, Lai-Su L. Yeh, Jian Zhang, and Winona C. Barker. “The Protein Information Resource: an integrated public resource of functional annotation of proteins”. Nucleic Acids Res. 2002, 30,35-37.[18] John Westbrook, Zukang Feng, Shri Jain, T. N. Bhat, Narmada Thanki, Veerasamy Ravichandran, Gary L. Gilliland, Wolfgang Bluhm, Helge Weissig, Douglas S. Greer, Philip E. Bourne and Helen M. Berman. “ The Protein Data Bank: unifying the archive”. Nucleic Acids Res. 2002, 30,245-248.[19] K Karplus, C Barrett, and R Hughey. “Hidden Markov models for detecting remote protein homologies”. Nucleic Acids Res. 1998, 14,846-856.[20] Pearl, F.M.G, Lee, D., Bray, J.E, Sillitoe, I., Todd, A.E., Harrison, A.P., Thornton, J.M. and Orengo, C.A. “Assigning genomic sequences to CATH” Nucleic Acids Res. 2000, 1. 277-282

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top