(44.192.112.123) 您好!臺灣時間:2021/03/06 07:05
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:唐健峰
研究生(外文):Cheng-Fong Tang
論文名稱:利用基因演算法結合二級結構資訊與氨基酸序列比對以達成蛋白質結構相似度之辨識:應用於TIM-barrel結構辨識
論文名稱(外文):Identification Protein Structure Similarity with Coupling Secondary Structure Information and Amino Acid Sequence Alignment by Using Genetic Algorithm: A Study of TIM-barrel Fold.
指導教授:孫春在孫春在引用關係
指導教授(外文):Chuen-Tsai Sun
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:66
中文關鍵詞:蛋白質結構辨識二級結構基因演算法
外文關鍵詞:proteinstructureidentificationsecondarygenetic
相關次數:
  • 被引用被引用:4
  • 點閱點閱:136
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
對於蛋白質的結構比對,序列比對通常是一個標準的取得第一手資訊的方式。但目前的序列比對工具(如BLAST)由於某些演化上的因素而無法找出一些在結構上或功能上相關卻在序列上較無關的蛋白質的同源性。本研究嘗試把序列比對與二級結構的資訊整合成一個新的方法以便於發現更多針對某種蛋白質結構的同源性。首先,經由基因演算法而產生出許多的線性評分函式。並且可以從中得到針對此類結構的相似度門檻。之後藉由投票的機制能得到此蛋白質結構關係的的結論。本研究所建構的方法可以運用在所有的蛋白質結構上。不過,基於TIM-barrel結構的重要性,本研究目前將之運用在TIM-barrel結構辨識上。
For protein structure similarity, the sequence alignment is a standard tool to obtain the first clue. But present sequence alignment tools, such as BLAST, can not detect homologies of distantly relative proteins due to certain evolutionary factors. This thesis developed a proposed method which integrates sequence alignment and secondary structure information to detect more homologies for specified structures. Several linear scoring functions were first constructed by genetic algorithm and some thresholds will be obtained for structural similarity. Though voting mechanism the proposed method can make a resolution of their structural relationship. This method should be able to be applied to all structures. Nevertheless, the thesis only applied it to TIM-barrel structure because of the importance of TIM-barrel.
摘 要 i
ABSTRACT ii
Acknowledgment iii
Contents iv
List of Figures vi
List of Tables vii
Chapter 1. Introduction 1
1.1. Motivation 1
1.2. TIM-barrel 2
1.3. Thesis Objectives 4
1.4. Organization of Thesis 4
Chapter 2. Survey of Related Work 5
2.1. Overview of Proteins 5
2.1.1. Protein 5
2.1.2. Different Levels of Protein Structure 6
2.2. Sequence Alignment 11
2.2.1. Substitution Matrices 12
2.2.2. Alignment Statistics 14
2.3. Algorithms of Sequence Alignment 14
2.3.1. Dot Plots 15
2.3.2. FASTA 16
2.3.3. BLAST 16
2.3.4. PSI-BLAST 17
2.3.5 Amino Acid Alignment with Structure-derived Substitution Matrices 17
2.3.6. Amino Acid Alignment / Secondary Structure Alignment 18
2.4. Three-Dimensional Structure Alignment 18
2.4.1. CE 19
2.4.2. Dali 19
2.5. Protein Structure Classification 20
2.5.1. Automatic Method 20
2.5.2. Semi-automatic Method 21
2.6. DSSP 21
2.7. Genetic Algorithm 22
Chapter 3. System Modeling 24
3.1. System Overview 24
3.2. Pre-Process 25
3.3. Linear Scoring Function 30
3.4. Genetic Algorithm 31
3.4.1. Chromosomes 31
3.4.2. Selection 31
3.4.3. Crossover 32
3.4.4. Mutation 33
3.4.5. Fitness Function 34
3.5. Voting Model 38
3.6. Conclusion 39
Chapter 4. Analysis of Experimental Results 41
4.1. Experimental Data 41
4.1.1. Training Data 41
4.1.2. Testing Data 43
4.2. Parameters in Genetic Algorithm 45
4.3. Experimental Results 46
4.4. Summary 60
Chapter 5. Concluding Remarks 61
Reference 62
[1] Abola, E.E., Bernstain, F.C., Bryant, S.H., Koetzle, T.F. and Weng, J. (1987) Protein data bank. In Allen, F.H., Bergerhoff, G. and Sievers, R. (eds), Crystallographic Databases- Information Content, Software Systems, Scientific Applications Data Commission of the International Union of Crystallography, Bonn, pp.107-132.
[2] Ackley, D.H. (1987) A connectionist machine for genetic Hillclimbing,, Kluwer, Boston, MA.
[3] Altschul, S.F. , Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic Local Alignment Search Tool, J. Mol. Biol, 215, 403-410.
[4] Altschul, S.F. and Gish, W. (1996) Local alignment statistics. Meth. Enzymol., 266, 460-480.
[5] Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, 25, 3389-3402.
[6] Ayers, D.J., Geooley, P.R., Winmer-Cooper, A. and Torda, A.E. (1999) Ehanced protein fold recognition using secondary structure information from NMR. Protein Sci., 8, 1127-1133.
[7] Baker, J.E. (1987) Reducing bias and inefficiency in selection algorithm. Proc. 2nd Int. Conf. on Genetic Algorithms, J.J. Gregenstette (Ed.), Lawrence Erbaum Associates, Hillsdale, NJ, 14-21.
[8] Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, 164-170.
[9] Brenner, S.E., Chothia, C. and Hubbard, T.J.P. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA, 95, 6073-6078.
[10] Coulson, A.F.W., Collins, J.F. and Lyall, A. (1987) Protein and nucleic acid sequence database searching: a suitable case for parallel processing. The Computer Journal, 30, 420-424.
[11] Dayhoff, M.O., Barker, W.C., and Hunt, L.T. (1983) Establishing homologies in protein sequences. Meth. Enz., 91, 524-545.
[12] Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. (1978) A model for evolutionary change. In Atlas of Protein Sequence and Structure, 5, 345-358.
[13] Farber, G.K. (1993) An barrel full of evolutionary trouble. Current Opinion in Structural Biology, 3, 409-412.
[14] Frishman, D. and Argos, P. (1996) Incorporation of long-distance interactions into a secondary structure prediction algorithm. Protein Eng., 9, 133-142.
[15] Frishman, D. and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. Proteins, 27, 329-335.
[16] Gerlt, J.A. and Babbitt, P.C. (2001) Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct superfamilies. Annu. Rev. Biochem, 70, 209-246.
[17] Gotoh, O. and Tagashira, Y. (1986) Sequence search on a supercomputer. Nucleic Acids Res, 14, 57-64.
[18] Growth of GenBank, http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
[19] Henikoff, S. and Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89, 10915-10919.
[20]
Holm, L. and Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J. Mol. Biol., 233, 123-138.
[21] Holm, L. and Sander, C. (1994) Searching protein structure databases has come of age. Proteins, 19, 165-173.
[22] Holm, L. and Sander, C. (1994) The FSSP database of structurally aligned protein fold families. Nucl. Acids Res., 22, 3600-3609.
[23] Holm, L. and Sander, C. (1996) Alignment of three-dimensional protein structures: network server for database searching. Meth. Enz., 266, 653-662.
[24] Holm, L. and Sander, C. (1996) Mapping the protein universe. Science, 273, 595-602.
[25] Holm, L. and Sander, C. (1996) The FSSP database: fold classification based on structure-structure alignment of proteins. Nucl. Acids Res., 24:206-210.
[26] Holm, L. and Sander, C. (1998) Dictionary of recurrent domains in protein structures. Proteins, 33, 88-96.
[27] Holland, J.H. (1962) Outline for a logical theory of adaptive systems. J. ACM., 3, 297-314.
[28] Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Bioplymer, 22, 2577-2637.
[29] Karlin, S. and Altschul, S.F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA, 87, 2264-2268.
[30] King, R.D. and Sternberg, M.J.E. (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci., 5, 2298-2310.
[31] Maizel, J.V. and Lenk, R.P. (1981) Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sci. USA, 78, 7665-7669.
[32] Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536-540.
[33] Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol, 48, 443-453.
[34] Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton, J.M. (1997) CATH- A hierarchic classification of protein domain structures. Structure, 5, 1093-1108.
[35] Pascarella, S. and Argos, P. (1992) A data bank merging related protein structures and sequences. Protein Eng., 5, 121-137.
[36] Pascarella, S., Milpetz, F. and Argos, P. (1996) A databank (3D-ali) collecting related protein sequences and structures. Protein Eng., 9, 249-251.
[37] PDB current holdings, http://www.rcsb.org/pdb/holdings.html#holdings
[38] Pearl, F.M.G., Lee, D., Bray, J.E, Sillitoe, I., Todd, A.E., Harrison, A.P., Thornton, J.M. and Orengo, C.A. (2000) Assigning genomic sequences to CATH. Nucleic Acids Research, 28, 277-282.
[39] Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci USA, 85, 2444-2448.
[40] Prlic, A., Domingues, F.S. and Sippl, M.J. (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng., 13, 545-550.
[41] Salamov, A.A. and Solovyev, V.V. (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments, J. Mol. Biol., 247, 11-15.
[42] Sali, A. and Blundell, T.L. (1990) Definition of general topological equivalence I protein structures. J. Mol. Biol., 212, 403-428.
[43] Sankoff, D. and Kruskal, J. (1983) Time Warps, String Edits, and Macromolecules: The theory and practice of sequence comparison. Addison-Wesley. Reading, MA.
[44] Sellers, P.H. (1974) On the theory and computation of evolutionary distances. SIAM Journal of Applied Mathematics, 26, 787-793.
[45] Shindyalov, I.N. and Bourne, P.E. (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng., 11, 739-747.
[46] Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195-197.
[47] Syswerda, G. (1989) Uniform crossover in genetic algorithms. Proc. 3rd Int. Conf. on Genetic Algorithms, J.D. Schaffer (Ed.), Morgan Kaufman Publishers, San Mateo, CA, 2-9.
[48] Tatusova, T.A., Madden, T.L. (1999) BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences, FEMS Microbiology Letters, 174, 247-250.
[49] Wallqvist, A., Fukunishi, Y., Murphy, L.R., Fadel, A. and Levy, R.M. (2000) Interative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignment and application to fold recognition in genome databases, Bioinformatics, 16, 988-1002.
[50] Waterman, M.S. and Perlwitz, M. (1984) Line geometries for sequence comparisons. Bull. Math. Biol., 46:567-577.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔