跳到主要內容

臺灣博碩士論文加值系統

(3.235.120.150) 您好!臺灣時間:2021/08/03 07:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林宣宏
研究生(外文):Hsuan-Hung Lin
論文名稱:利用多軌跡搜尋法調校支援向&;#63870;機參數以預測雙&;#63950;鍵之鍵結型態
論文名稱(外文):Disulfide Bonding Patterns Prediction Using Support Vector Machine with Parameters Tuned by Multiple Trajectory Search
指導教授:曾怜玉曾怜玉引用關係
指導教授(外文):Lin-Yu Tseng
學位類別:博士
校院名稱:國立中興大學
系所名稱:應用數學系所
學門:數學及統計學門
學類:數學學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:63
中文關鍵詞:支援向&;#63870支援向&;#63870支援向&;#63870支援向&;#63870支援向&;#63870
外文關鍵詞:Disulfide bonding statedisulfide bonding patternsupport vector machinemultiple trajectory search
相關次數:
  • 被引用被引用:1
  • 點閱點閱:101
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
蛋白質結構預測問題是一個著名的計算生物學問題,目前仍是結構生物學的一大挑戰。雙硫鍵之鍵結型態於穩定蛋白質結構具有很重要的作用,對於蛋白質摺疊的預測而言,正確地預測雙硫鍵鍵結型態能大幅降低搜尋空間。因此,正確地預測雙硫鍵之鍵結的位置有助於解決蛋白質摺疊問題。由此觀之,發展一套能夠準確預測雙硫鍵之鍵結型態能有效促進蛋白質立體架構及其功能之預測。
本研究中,首先以位置加權矩陣(position specific scoring matrix, PSSM)、正規化雙硫鍵鍵距、預測的蛋白質二級結構與氨基酸的物理化學指標值作為支援向量機(SVM)之輸入特徵值,訓練及建構預測模組應用於計算半光氨酸對(cysteine pair)之間形成鍵結的機率。此外,本研究也利用多軌跡搜尋法(multiple trajectory search, MTS)調校支援向&;#63870;機參數及特徵值的 window 值大小,再將支援向量機輸出的鍵結的機率值以最大權重最佳配對演算法(maximum weight perfect matching algorithm)找出雙硫鍵之鍵結型態。於事先已知道半光氨酸鍵結狀態下,對於資料集SP39,由實驗結果顯示,本論文提出的方法,預測雙硫鍵之鍵結型態之最佳預測準確率(QP)為79.8%(QP),而預測半光氨酸對之間是否形成鍵結的最佳正確率(QC)為80.9%。而於事先未知半光氨酸鍵結狀態下,對於資料集SPX,本論文之方法預測準確率將由目前已發表論文之最好結果51% (QP) 及52% (QC),分別提高至54.5% (QP) 及60% (QC)。
其次,我們使用與蛋白質三級結構相關的特徵,利用MODELLER預測蛋白質序列各氨基酸的Cα(α碳)的座標,先計算出各氨基酸之間的歐基里得距離(Euclidean distance),並延伸計算出正規化對距(normalized pair distance, NPD)向量作為輸入特徵值。利用多軌跡搜尋法調校支援向&;#63870;機參數及特徵值NPD的 window 值大小,將支援向量機輸出的鍵結的機率值以修改後的最大權重最佳配對演算法找出雙硫鍵鍵結型態。由實驗得知,此方法於事先已知半光氨酸鍵結狀態下,對於資料集SP39,QP大幅提昇至92.2%,而QC也大幅提昇至94.2%。而於事先未知半光氨酸鍵結狀態下,對於資料集SPX,QP也可達84.4%,而QC則可達94.6%。由以上可知,本論文的方法能有效改善預測雙硫鍵的準確率。


Prediction of the protein structure is one of the most important problems in the computational biology, and it remains one of the biggest challenges in the structural biology. Disulfide bonds play an import structural role in stabilizing protein conformations. For the protein-folding prediction, a correct prediction of disulfide bridges can greatly reduce the search space. The prediction of disulfide bonding pattern helps, to a certain degree, predicts the 3D structure of a protein and hence its function since disulfide bonds imposes geometrical constraints on the protein backbones.
In this dissertation, we first used the position-specific scoring matrix (PSSM), normalized bond lengths, the predicted secondary structure of protein, and the physicochemical properties index of the amino acid as the features for designing the classifier based on the support vector machine (SVM). The classifier was trained to compute the connectivity probabilities of cysteine pairs. In addition, an evolutionary algorithm called the multiple trajectory search (MTS) was integrated with the SVM model to tune the parameters of the SVM and the window sizes for the features. The maximum weighted perfect matching algorithm was then used to find the disulfide connectivity pattern. In this study, the experimental results show that the accuracies rate reaches 79.8% for the prediction of the overall disulfide connectivity pattern (QP) and that of disulfide bridge prediction (QC) is 80.9% for dataset SP39. Without the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 54.5% (QP) and 60% (QC), respectively.
Then, the protein 3D structure related features called normalized pair distance (NPD) vector were imposed. From experiments, we obtained the good performance for four problems in disulfide bond prediction. With the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 92.2% (QP) and 94.2% (QC) respectively for dataset SP39. Without the prior knowledge of the bonding states of cysteines, the results show that the accuracies rate reaches 84.4% (QP) and 94.6% (QC) respectively for dataset SPX.


Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Disulfide Bond 2
1.3 Disulfide Bond Prediction 3
1.4 Organization of the Dissertation 3
Chapter 2 Literature Review 5
Chapter 3 Methods 9
3.1 Support Vector Machine 9
3.2 Multiple Trajectory Search 12
3.2.1 Orthogonal Array and Simulated Orthogonal Array 12
3.2.2 Multiple Trajectory Search 14
3.3 Construction of the Prediction Model Based on the SVM 16
3.4 Multiple Trajectory Search for Selecting SVM Parameters and Window Sizes 16
3.5 Maximum Weighted Perfect Matching 17
Chapter 4 Datasets and Evaluation Criteria Indexes 19
4.1 The Datasets for Protein Chain and Residues Classification 19
4.2 The Datasets for Bridge Classification and the Prediction of Disulfide Bonding Pattern 20
4.3 Evaluation Criteria Indexes for Residues Classification 21
4.4 Evaluation Criteria Indexes for Bridge Classification and the Prediction of Disulfide Bonding Pattern 21
Chapter 5 Prediction of Disulfide Bonding Pattern Using Secondary Structure Information and Sequence Profiles 23
5.1 Prediction with the Prior Knowledge of the Bonding States of Cysteines 24
5.1.1 Features 24
5.1.2 Methodology 24
5.1.3 The Experimental Results 26
5.1.4 Holdout Test on New Sequences from SP43 and SP56 27
5.1.5 Web Server for the Prediction of Disulfide Bonding Pattern 28
5.1.6 Conclusions 28
5.2 Prediction without the Prior Knowledge of the Bonding States of Cysteines 29
5.2.1 Features 29
5.2.2 The Coding Phase of the Prediction Model Training 30
5.2.3 Probability Threshold 31
5.2.4 Experiment Results 31
5.2.5 Conclusions 33
Chapter 6 Disulfide Bonding Pattern Prediction Server Using the Feature NPD 34
6.1 The Feature 34
6.2 DBPPS: Disulfide Bonding Pattern Prediction Server with the Prior Knowledge of the Bonding State of Cysteines 35
6.2.1 Methodology 36
6.2.2 Experimental Results 38
6.2.3 Inputs of the DBPPS 40
6.2.4 Outputs of the DBPPS 41
6.2.5 Conclusions 42
6.3 DBCP: A Web Server for Disulfide Bonding Connectivity Prediction without the Prior Knowledge of the Bonding State of Cysteines 43
6.3.1 The Threshold for Pair Distance 43
6.3.2 The Coding Phase of the Prediction Model Training 44
6.3.3 Modified Maximum Weighted Perfect Matching Algorithm 44
6.3.4 The Prediction Steps 45
6.3.5 An Example of the Prediction Flow 47
6.3.6 Results and Discussions 48
6.3.6.1 Protein Chains Classification 48
6.3.6.2 Residues Classification 48
6.3.6.3 Bridge Classification and Prediction of the Disulfide Bonding Pattern 49
6.3.6.4 Evaluation of the Web Server 51
6.3.7 Limitations 53
6.3.8 Inputs of the DBCP 54
6.3.9 Outputs of the DBCP 55
6.3.10 Conclusions 57
Chapter 7 Conclusions and Future Works 58
Bibliography 59


[1].S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs”, Nucleic Acids Research, Vol. 25, pp. 3389–3402, 1997.
[2].W. Antuch, P. Guntert, M. Billeter, T. Hawthorne, H. Grossenbacher, and K. Wuthrich, “NMR solution structure of the recombinant tick anticoagulant protein (rTAP), a factor Xa inhibitor from the tick Ornithodoros moubata”, FEBS Letters, Vol. 352, pp. 251-257, 1994.
[3].A. Bairoch and R. Apweiler, “The Swiss-Prot protein sequence database and its supplement TrEMBL in 2000”, Nucleic Acids Research, Vol. 28, pp. 45-48, 2000.
[4].H. M. Berman et al., “The Protein Data Bank”, Acta Crystallogr, Vol. D58, pp. 899-907, 2002.
[5].A. Ceroni, A. Passerini, A. Vullo and P. Frasconi, “DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server”, Nucleic Acids Res., Vol. 34, pp. W177-W181, 2006.
[6].C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines”, 2001, Retrieved from http://www.csie.ntu.edu.tw/~cjlin/libsvm
[7].Y. C. Chen and J. K. Hwang, “Prediction of disulfide connectivity from protein sequences”, Proteins, Vol. 61, pp. 507-512, 2005.
[8].Y. C. Chen, Y. S. Lin, C. J. Lin and J. K. Hwang, “Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences”, Proteins, Vol. 55, pp. 1036-1042, 2004.
[9].B. J. Chen, C. H. Tsai, C. H. Chan and C. Y. Kao, “Disulfide connectivity prediction with 70% accuracy using two-level models”, Proteins, Vol. 64, pp. 246-252, 2006.
[10].J. Cheng, H. Saigo and P. Baldi, “Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching”, Proteins, Vol. 62, pp. 617-629, 2006.
[11].C. C. Chuang, C. Y. Chen, J. M. Yang, P. C. Lyu, and J. K. Hwang, ”Relationship between protein structures and disulfide-bonding patterns”, Proteins, Vol. 55, pp. 1–5, 2003.
[12].N. Cristianini and J. Shawe-Taylor, An introduction to support vector machines and other kernel-based methods, Cambridge University Press, Cambridge, UK. 2000.
[13].P. Fariselli and R. Casadio, “Prediction of disulfide connectivity in proteins”, Bioinformatics, Vol. 17, pp. 957-964, 2001.
[14].P. Fariselli, P. L. Martelli and R. Casadio, “A neural network base method for predicting the disulfide connectivity in proteins”, In E. Damiani et al., eds. Knowledge based Intelligent Information Engineering Systems and Allied Technologies KES 2002, IOS Press, Amsterdam, Vol. 1, pp. 464–468, 2002.
[15].P. Fariselli, P. Riccobelli and R. Casadio, “Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins”, Proteins, Vol. 36, pp. 340-346, 1999.
[16].F. Ferrè, and P. Clote, “Disulfide connectivity prediction using secondary structure information and diresidue frequencies”, Bioinformatics, Vol. 21, pp. 2336-2346, 2005.
[17].F. Ferrè, and P. Clote, “DiANNA 1.1: An extension of the DiANNA web server for ternary cysteine classification”, Nucleic Acids Res., Vol. 34, pp. W182-W185, 2006.
[18].A. Fiser, M. Cserzö, É. Tüdös and I. Simon, “Different sequence environments of cysteins and half cysteines in proteins: application to predict disulfide forming residues”, FEBS Letter, Vol. 302, pp.117-120, 1992.
[19].A. Fiser and I. Simon, “Predicting the oxidation state of cysteines by multiple sequence alignment”, Bioinformatics, Vol. 16, pp. 251-256, 2000.
[20].H. N. Gabow, “Implementation of algorithms for maximum matching on nonbipartite graphs”, Phd Thesis, Stanford University, CA. 1973.
[21].E. S. Huang, R. Samudrala, and J. W. Ponder, “Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions”, J. Mol. Biol., Vol. 290, pp. 267–281, 1999.
[22].D. T. Jones, “Protein secondary structure prediction based on position-specific scoring matrices”, J. Mol. Biol., Vol. 292, pp. 195-202, 1999.
[23].H. H. Lin and L. Y. Tseng, “Prediction of Disulfide Bonding Pattern Based on Support Vector Machine with Parameters Tuned by Multiple Trajectory Search”, WSEAS Transactions on Computers, Vol. 9, pp. 1429-1439, 2009.
[24].L. C. Loredana, E. B. Steven, J. P. H. Tim, C. Chothia, and A. G. Murzin, “SCOP database in 2002: refinements accommodate structural genomics”, Nucleic Acids Res. Vol. 30, pp.264-267, 2002.
[25].C. H. Lu, Y. C. Chen, C. S. Yu and J. K. Hwang, “Predicting disulfide connectivity patterns”, Proteins, Vol. 67, pp. 262-270, 2007.
[26].P. L. Martelli, P. Fariselli, L. Malaguti and R. Casadio,”Prediction of the disulfide-bonding state of cysteines in proteins with hidden neural networks”, Protein Engineering, Vol. 15, pp. 951-953, 2002.
[27].J. Meiler, M. Muller, A. Zeidler, F. Schmaschke, “Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks”, Journal of Molecular Modeling, Vol. 7, pp. 360-369, 2001.
[28].S. Mika and B. Rost, “Uniqueprot: creating representative protein sequence sets”, Nucleic Acids Res, Vol. 31, pp. 3789-3791, 2003.
[29].M. H. Mucchielli-Giorgi, S. Hazout and P. Tuffèry, “Predicting the disulfide bonding state of cysteines using protein descriptors”, Proteins, Vol. 46, pp. 243-240, 2002.
[30].S. M. Muskal, S. R. Holbrook and S. H. Kim, “Prediction of the disulfide-bonding state of cysteine in proteins”, Protein Eng., Vol. 3, pp. 667-672, 1990.
[31].R. Rubinstein and A. Fiser, “Predicting disulfide bond connectivity in proteins by correlated mutations analysis”, Bioinformatics, Vol. 24, pp. 498-504, 2008.
[32].A. Sali, and T. L. Blundell, “Comparative protein modeling by satisfaction of spatial restraints”, J. Mol. Biol., Vol. 234, pp. 799-815, 1993.
[33].C. Sander and R. Schneider, “Database of homology-derived protein structures and the structural meaning of sequence alignment”, Proteins, Vol. 9, 56-68, 1991.
[34].J. Skolnick, A. Kolinski, and A. R. Ortiz, “MONSSTER: a method for folding globular proteins with a small number of distance restraints”, J. Mol. Biol., Vol. 265, pp. 217–241, 1997
[35].J. Song, Z. Yuan, H. Tan. T. Huber and K. Burrage, “Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure”, Bioinformatics, Vol. 23, pp. 3147-3154, 2007.
[36].S. Theodoridis and K. Koutroumbas, Pattern Recognition. 2nd edn. Academic Press, San Dieago, 2003.
[37].C. H. Tsai, B. J. Chen, H. H. Chan, H. L. Liu and C. Y. Kao, “Improving disulfide connectivity prediction with sequential distance between oxidized cysteines”, Bioinformatics, Vol. 21, pp. 4416–4419, 2005.
[38].L. Y. Tseng and C. Chen, “Multiple trajectory search for large scale global optimization”, Proceedings of 2008 IEEE Congress on Evolutionary Computation, CEC''08, Crystal City, Washington, DC, pp. 3052-3059, 2008.
[39].H. W. T. van Vlijmen, A. Gupta, L. S. Narasimhan, and J. Singh, “A novel database of disulfide patterns and its application to the discovery of distantly related homologs”, J. Mol. Biol., Vol. 335, pp. 1083–1092, 2004.
[40].A. Vullo, and P. Frasconi, “Disulfide connectivity prediction using recursive neural networks and evolutionary information”, Bioinformatics, Vol. 20, pp. 653-659, 2004.
[41].D. Witt, "Recent developments in disulfide bond formation", Synthesis, Vol. 16, pp. 2491-2509, 2008.
[42].E. Zhao, H. L. Liu, C. H. Tsai, H. K. Tsai, C. H. Chan, C. Y. Kao, “Cysteine separations profiles on protein sequences infer disulfide connectivity”, Bioinformatics, Vol. 21, pp. 1415-1420, 2005.
[43].It uses material from the Wikipedia article "Disulfide_bond" http://en.wikipedia.org/wiki/Disulfide_bond, used under the GNU Free Documentation License.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top