跳到主要內容

臺灣博碩士論文加值系統

(54.83.119.159) 您好!臺灣時間:2022/01/17 08:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林凡凱
研究生(外文):Fan Kai, Lin
論文名稱:蛋白質二級結構預測-使用基因演算法
論文名稱(外文):Protein Secondary Structure Prediction Using Genetic Algorithm
指導教授:孫春在 
指導教授(外文):Chuen-Tsai Sun
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:85
中文關鍵詞:蛋白質二級結構預測基因演算法基模
外文關鍵詞:Protein Secondary Structure PredictionGenetic AlgorithmSchema
相關次數:
  • 被引用被引用:13
  • 點閱點閱:493
  • 評分評分:
  • 下載下載:111
  • 收藏至我的研究室書目清單書目收藏:0
依據目前的蛋白質二級結構(protein secondary structure)預測方法的準確率,其結果無法充分的信任,只能當作輔助工具,如果預測的方法能夠提供預測者更多的資訊來幫助判斷預測結果的好壞,此預測方法將更有參考價值。
過去的蛋白質二級結構(protein secondary structure)預測方法大多都是屬於「黑盒子」的方法,如類神經網路,預測者得到的只是一個預測結果以及數千萬個用來作判斷的權重(weight),如此預測者很難分析預測的結果,這些權重對於預測者來說是沒有意義的。
本研究利用基因演算法,在蛋白質一級結構中(primary structure)中,找到一些基模(schema),利用這些基模來預測蛋白質二級結構可以得到60%以上的預測準確率(本研究並未加入多重序列比對的資訊),預測者可以根據這些基模提供的資訊來分析預測的結果,預測者可以獲得的資訊包括:該基模存在於哪一個蛋白質中,該基模的搜尋是利用哪一個取代矩陣,以及在做預測時用來做判斷的是哪個基模。
除此之外,本研究也找出一些基模,其預測摺板的預測準確率可以達到70%,在過去的研究中,摺板是最不易被預測的。
本研究也針對NRPDB(non-redundant PDB)中的蛋白質做搜尋,搜尋的結果發現,某幾條基模出現的次數多達三百多次,而且預測的準確率可以達到八成,甚至九成。
Accuracy of protein secondary structure prediction presently is about 75%. We cannot predict protein secondary structure exactly. In other words, the accuracy “75%” cannot tell us if it also can predict an unknown protein with that accuracy. Users need more information to analyze if the result of prediction is good enough. Most past protein secondary structure prediction models, like neural network, belong to “black box” method and users get little information from thousands of weights of those models to analyze the result of prediction.
In this thesis, we found some schemas in protein primary structure using genetic algorithm. The accuracy of our model, which does not include information of multiple sequences alignment, is about 60%. Users will get information about which protein the schema comes from, which substitution matrix we used to find the schema, which schema we used to predict, etc. These kinds of information can help users to analyze the result of prediction.
Furthermore, we found some schemas to predict sheet with accuracy about 70%. The secondary structure “sheet” is difficultly to predict in the past.
Another contribution of this thesis is that we found some schemas appearing in NRPDB(non-redundant PDB) above 300 times, and accuracy between 80% and 90%.
摘要 ......................................... i
Abstract ......................................... ii
誌謝 ......................................... iii
目錄 ......................................... iv
第一章 前言 ................................ 1
1.1 研究動機 ................................ 1
1.1.1蛋白質二級結構的重要性 ....................... 2
1.1.2為什麼要預測蛋白質二級結構 .............. 2
1.2 研究假設 ................................ 4
1.3 研究目的 ................................ 5
1.4 研究目標 ................................ 6
1.5 論文架構 ................................ 7
第二章 文獻探討 ................................ 8
2.1 蛋白質結構簡介 ....................... 8
2.1.1 蛋白質 ................................ 8
2.1.2 砌塊(Building Block) ....................... 10
2.2 蛋白質二級結構預測 ....................... 16
2.2.1 二級結構預測的發展 ....................... 16
2.2.2 二級結構預測的評估 ....................... 21
2.3 蛋白質資料庫 ................................ 27
2.4 蛋白質結構分類 ....................... 28
第三章 基模及基因演算法模型設計 .............. 29
3.1 基模的假設 ................................ 29
3.1.1基模的優點 ................................ 29
3.1.2 基模的困難點 ................................ 30
3.2 模型設計的目的與概念 ....................... 32
3.3 模型架構 ................................ 33
3.3.1 定義與前置作業 ....................... 34
3.3.2 編碼方式 ................................ 36
3.3.3 染色體族群 ................................ 39
3.3.4 適應函數 ................................ 39
3.3.5 挑選母代的機制 ....................... 40
3.3.6 運算子的設計 ................................ 40
3.4 基模預測的機制 ....................... 44
3.4.1 預測流程 ................................ 45
3.4.2 完全比對法 ................................ 46
3.4.3 取代矩陣法 ................................ 46
第四章 模型實作及實驗結果 ....................... 51
4.1 蛋白質資料 ................................ 51
4.1.1 蛋白質資料挑選 ....................... 51
4.1.2訓練資料與測試資料之挑選 .............. 53
4.2 系統流程 ................................ 55
4.3 控制參數 ................................ 56
4.3.1交換率以及突變率 ....................... 56
4.3.2族群的大小 ................................ 56
4.3.3視窗的大小 ................................ 56
4.3.4基模的個數 ................................ 58
4.3.5預測的機制 ................................ 58
4.4 實驗結果分析 ................................ 67
4.4.1搜尋空間 ................................ 67
4.4.2結果分析及比較 ....................... 67
第五章 結論與建議 ....................... 71
5.1 結論 ......................................... 71
5.2 未來研究方向 ................................ 73
參考文獻 ......................................... 75
[1]Anfinsen, C. B., Haber, E., Sela, M. & White, F. H. (1961). The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Nat. Acad. Sci., U.S.A. 47, 1309-1314.
[2]Asai, K., Haymizu, S. & Handa, K. (1993). Prediction of protein secondary structure by the hidden Markov model. CABIOS 2, 141-146.
[3]Baldi, P., Brunak, S., Frasconi, P., Pollastri, G. & Soda, G. (1999). Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15, 937-946.
[4]Baldi, P., Brunak, S., Frasconi, P., Pollastri, G. & Soda, G. (1999). Bidirectional dynamics for protein secondary structure prediction. IJCAI99 workshop on neural, symbolic, and reinforcement methods for sequence learning.
[5]Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235-242.
[6]Bowie, J. U., Luthy, R. & Eisenberg, D. (1991). A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164-170.
[7]Branden, C. & Tooze, J. (1991). Introduction to Protein Structure. GARLAND press, New York.
[8]Bystroff, C., Thorsson, V. & Baker, D. (2000). HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301, 173-190.
[9]Camproux, A. C., Tuffery, P. Buffat, L., Andre, C., Boisvieux, J. F. & Hazout, S. (1999). Analyzing patterns between regular secondary structures using short structural building blocks defined by a hidden Markov model. Theor. Chem. Acc. 101, 33-40.
[10]Chou, P. Y. & Fasman, G. D. (1974). Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-222.
[11]Cost, S. & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10, 57-58.
[12]Cuff, J. A. & Barton, G. J. (2000). Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40, 502—511.
[13]Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Protein 34, 508-519.
[14]Cuff, J. A., Clamp, M. E., Siddiquiaa, A. S., Finlay, M. & Barton, G. J. (1998). Jpred: A consensus secondary structure prediction server. Bioinformatics 14, 892-839.
[15]Daggett, V. & Levitt, M. (1993). Protein unfolding pathways explored through molecular dynamics simulations. J. Mol. Biol. 232, 600- 619.
[16]Eisenberg, D., Wesson, M. & Yamashita, M. (1989). Interpretation of protein folding and binding with atomic solvation parameters. Chemica Scripta 29A, 217-221.
[17]Frishman, D. & Argos, P. (1996). Incorporation of non-local interactions in protein secondary structure prediction from the amino-acid sequence. Prot. Eng. 9, 133-142.
[18]Frishman, D. & Argos, P. (1997). Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27, 329-335.
[19]Garnier, J., Osguthorpe, D. J. & Robson, B. (1978). Analysis of the accuracy and implications of simple methods for predictiong the secondary structure of globular proteins. J Mol Biol. 120, 97-120.
[20]Gibrat J. F., Robson, B. & Garnier, J. (1987). Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J. Mol. Biol. 198, 425-443.
[21]Gonnet, G. H., Cohen, M. A. & Benner, S. A. (1992). Exhaustive matching of the entire protein sequence database. Science 256, 1443-1445.
[22]Gribskov, M., McLachlan, A. D. & Eisenberg, D. (1987). Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. 84, 4355-4358.
[23]Hobohm, U., Scharfa, M., Schneider, R. & Sander, C. (1992). Selection of a representative set of structures from the brookhaven protein data bank. Protein Science 1, 409-417.
[24]Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan press.
[25]Holley, L. H. & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proc Natl. Acad. Sci. USA 86, 152-156.
[26]Hua, S. & Sun, Z. (2001). A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J. Mol. Biol. 308, 397-407.
[27]Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195-202.
[28]Kabsch, W. & Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Bioplymers 22, 2577-2637.
[29]Kendrew, J. C., Dickerson, R. E., Strandberg, B. E., Hart, R. J., Davies, D. R. & Phillips, D. C. (1960). Structure of myoglobin: A three-dimensional Fourier synthesis at 2 Å resolution. Nature 185, 422-427.
[30]King, R. D. & Sternberg, M. J. E. (1996). Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Prot Sci. 5, 2298-2310.
[31]Kneller, D. G., Cohen, F. E. & Langridge, R. (1990). Improvements in protein secondary structure prediction by an enhanced neural network. J. Mol. Biol. 214, 171-182.
[32]Levin, J., Robson, B. & Garnier, J. (1986). An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett. 205, 303-308.
[33]Levitt, M. & Chothia, C. (1976). Structural patterns in globular proteins. Nature (London) 261, 552-557.
[34]Lim, V. I. (1974). Algorithm for prediction of α-helical and β-structural regions in globular proteins. J. Mol. Biol. 88, 873-894.
[35]Luthy, R., McLachlan, A. D. & Eisenberg, D. (1991). Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. PROTEINS: Structure, Function, and Genetics 10, 229-239.
[36]Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta. 405, 442-451.
[37]Moran, L. B., Schneider, J. P., Kentsis A., Reddy, G. A. & Sosnick, T. R. (1999). Transition state heterogeneity in GCN4 coiled coil folding studied by using multi-site mutations and cross-linking. Proc. Nat. Acad. Sci. USA 96, 10699-10704.
[38]Murzin A. G., Brenner S. E., Hubbard T. & Chothia C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536-540.
[39]Myers, J. K. & Oas, T. G. (2001). Preorganized secondary structure as an important determinant of fast protein folding. Nature Structure Biology 8, 552-558.
[40]Nishikawa, K., & Ooi, T. (1986). Amino acid sequence homology applied to the prediction of protein secondary structures, and joint prediction with existing methods. Biochim. Biophys. Acta. 871, 45-54.
[41]Patthy, L. (1999). Protein Evolution. Blackwell Science press.
[42]Pauling, L. & Corey, R. B. (1951). Configurations of polypeptide chains with favored orientations around single bonds: Two new pleated sheet. Proc. Natl. Acad. Sci. USA 37, 729-740.
[43]Pauling, L., Corey, R. B. & Branson, H. R. (1951). The structure of proteins: Two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. USA 37, 205-234.
[44]Perutz, M. F., Rossmann, M. G., Cullis, A. F., Muirhead, G., Will, G. & North, A. T. (1960). Structure of haemoglobin: A three-dimensional Fourier synthesis at 5.5 Å resolution, obtained by X-ray analysis. Nature 185, 416-422.
[45]Qian, N. & Sejnowski, T. (1988). Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865-884.
[46]Rost, B. (2001). Review: protein secondary structure prediction continues to rise. Journal of Structural Biology 0, 1-15.
[47]Rost, B. & Sander, C. (1993). Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584-599.
[48]Solovyev, V. V & Salamov, A. A. (1994). Predicting α-helix and β-strand segments of globular proteins. CABIOS. 10, 661-669.
[49]Solovyev, V. V & Salamov, A. A. (1995). Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J. Mol. Biol. 247, 11-15.
[50]Solovyev, V. V & Salamov, A. A. (1997). Protein secondary structure prediction using local alignments. J. Mol. Biol. 268, 31-36.
[51]Szent-Györgyi, A. G. & Cohen, C. (1957). Role of proline in polypeptide chain configuration of proteins. Science 126, 697.
[52]Thompson, M. J. & Goldstein, R. A. (1997). Predicting protein secondary structure with probabilistic schemata of evolutionarily-derived information. Protein Science 6, 1963-1975.
[53]Yi, T. M. & Lander, E. S. (1993). Protein secondary structure prediction using nearest-neighbor methods. J. Mol. Biol. 232, 1117-1129.
[54]Zvelebil, M. J. J. M., Barton, G. J., Taylor, W. R. & Sternberg, M. J. E. (1987). Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 195, 957-961.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top