(3.227.235.183) 您好!臺灣時間:2021/04/13 09:15
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃鼎耀
研究生(外文):Ding-Yao Huang
論文名稱:利用支援向量迴歸於蛋白質骨幹原子座標之修正方法
論文名稱(外文):Coordinate Refinement on All Atoms of the Protein Backbone with Support Vector Regression
指導教授:楊昌彪楊昌彪引用關係
指導教授(外文):Chang-Biau Yang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:104
語文別:英文
論文頁數:67
中文關鍵詞:生物資訊蛋白質骨幹三維座標預測支援向量迴歸
外文關鍵詞:support vector regressionpredictionthree-dimensional coordinatesprotein backbonebioinformatics
相關次數:
  • 被引用被引用:0
  • 點閱點閱:99
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:22
  • 收藏至我的研究室書目清單書目收藏:0
蛋白質結構的預測在生物資訊領域上已經發展了數十年。蛋白質骨幹重建問題為給定一條目標蛋白質序列和中心碳的座標,重建出其骨幹上所有原子的三維座標。為了使預測更準確,我們利用支援向量迴歸的方法來修正骨幹上原子的三維座標。我們使用在蛋白質骨幹預測表現比較好的兩個方法PD2和BBQ所預測出來的座標當作我們的候選特徵,接著我們定義了超過100個可能的特徵。在經過相關性的計算,我們找到多個與預測目標相關的特徵。我們進行了leave-one-protein-out以及5-fold 交叉驗證的實驗,實驗的資料集包含了CASP7到CASP11。實驗的結果顯示我們方法的平均RMSD值比PD2提升8%,因此在這個問題上我們的方法是最準確的預測工具。
For the past decades, the protein structure prediction has been developed in the fields of bioinformatics. The protein backbone reconstruction problem (PBRP) is to reconstruct the 3D coordinates of all atoms on the protein backbone for a given target protein sequence and its Cα coordinates. In order to improve the prediction accuracy, we aim to refine the 3D coordinates of all backbone atoms with support vector regression (SVR). We use the predicted coordinates of two excellent methods, PD2 and BBQ, as our feature candidates. Accordingly, we define more than 100 possible features. After their correlations are calculated, we find out several significant features deeply related to the prediction target. Then, the leave-one-protein-out method and 5-fold cross validation are invoked to perform the experiments, and the experimental datasets include CASP7 through CASP11. As the experimental results show, our method has about 8% improvement in RMSD over PD2, which is the most accurate predictor for the problem.
中文審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
英文審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ii
謝辭 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . viii
LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Proteins and Amino Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Root Mean Square Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Pearson''s Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 SABBAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.2 Wang''s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.3 Chang''s Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.4 BBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.5 Yen''s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.6 Chen''s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
2.5.7 Wu''s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.8 PD2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3. The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Feature Generation and Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 The Difference Prediction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
[1] S. A.Adcock,” Peptide backbone reconstruction using dead-end elimination and a knowledge-based forcefield," Journal of Computational Chemistry, Vol.25, pp. 16-27, 2004.
[2] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, andP. E. Bourne, “The protein data bank," Nucleic Acids Research, Vol.28, pp.235-242, 2000.
[3] B. Boser, I. Guyon, and V. Vapnik, ”A training algorithm for optimal margin classifiers," Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, NY, USA, pp.144-152,1992.
[4] B. R. Brooks, C. L. B. III, A. D. M. Jr, L. Nilsson, R. J. Petrella, B. Rouxand, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J.Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D.M. York, and M. Karplus, ”CHARMM: The biomolecular simulation program," Journal of Computational Chemistry, Vol.30,pp.1545-1614,2009.
[5] C. J.Burges, ” A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, Vol.2(2),pp.121-167,1998.
[6] J.M.Chandonia, G. Hon, N. S. Walker, L. L. Conte ,P. Koehl ,M. Levitt, and S. E.Brenner,”The ASTRAL Compendium in 2004," Nucleic Acids Research, Vol.32,pp.D189-D192,2004.
[7] C. C. Chang and C. J. Lin, “LIBSVM: A library for support vector machines," ACM Transactions on Intelligent Systems and Technology, Vol.2, No. 3, pp.27:1-27:27,2011.
[8] H.Y.Chang,C.B.Yang,andH.Y.Ann,"Refinement on O atom positions for protein backbone prediction," Proceedings of the 2nd WSEAS International Conference on Biomedical Electronicsand Biomedical Informatics(BEBI''09), Moscow,Russia,pp.99-104,2009.
[9] Y.W.Chang,C.J.Hsieh,K.W.Chang,M.Ringgaard,and C.J.Lin,"Training and testing low-degree polynomial data mappings via linearSVM," Journal of Machine LearningResearch, Vol.11,pp.1471-1490,2010.
[10] K. Y.Chen,C.B.Yang,andK.S.Huang,"Prediction of protein backbone structure by preference classification with SVM," Proceedings of the 9th International Conference on Information Systems and Technology Management, Sao Paulo,Brazil,pp.1193-1206,2012.
[11] K. Y.Chen,"Forecasting systems reliability based on support vector regression with genetical gorithms," Reliability Engineering and System Safety, Vol.92, pp. 423-432,2007.
[12] V. CherkasskyandY.Ma,"Practical selection of SVM parameters and noise estimation for SVM regression," NeuralNetworks, Vol.17,pp.113-126,2004.
[13] W. D.Cornell,P.Cieplak,C.I.Bayly,I.R.Gould,J.K.M.Merz,D.M. Ferguson,D.C.Spellmeyer,T.Fox,J.W.Caldwell,andP.A.Kollman,"A second generation forcefield for the simulation of proteins,nucleicacids,and organic molecules," Journal of American Chemical Society, Vol.117,pp.5179- 5197, 1995.
[14] C. CortesandV.Vapnik,"Support-Vector Networks," Machine Learning, Vol.20,pp.273-297,1995.
[15] K. Duan,S.Keerthi,andA.Poo,"Evaluation of simple performance measures for tuning SVM hyperparameters," Neurocomputing, Vol.51,pp.41-59,2003.
[16] I. Dubchak, I.Muchnik,S.R.Holbrook,and S.H.Kim,"Prediction of protein folding class using global description of amino acid sequence," Proceedingsof the National Academy of Sciences of the United States of America, Vol.92, pp. 8700-8704,1995.
[17] R. Fletcher, Practical Methods of Optimization. Wiley,New York,1989.
[18] D. Gront,S.Kmiecik,andA.Kolinski,"Backbone building from quadrilaterals: A fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates," Journal of Computational Chemistry, Vol.28,pp.1593-1597, 2007.
[19] L. Holm and C.Sander,"Database algorithm for generating protein backbone and side-chain coordinates froma Calpha trace application to model building and detection of coordinate errors," Journal of Molecular Biology, Vol.21, No. 1,pp.183-194,1991.
[20] D. E.JamesU.Bowie,RolandLuthy,"A method to identify protein sequences that fold into a known three-dimensional structure," Science, Vol.253,pp.164- 170, 1991.
[21] T. Jones,J.Zou,S.Cowan,and M.Kjeldgaard,"Improved method for building models in electron density maps and the location of errors in the semodels.," Acta Crystallographica Section A, Vol.47,pp.110-119,1992.
[22] W. Kabsch,"A solution for the best rotation torelate two sets of vectors," Acta Crystallographica Section A, Vol.32,pp.922-923,1976.
[23] W. Kabsch,"A discussion of the solution for the best rotation torelate two sets of vectors," Acta Crystallographica Section A, Vol.34,pp.827-828,1978.
[24] R. Kazmierkiewicz,A.Liwo,andH.A.Scheraga,"Energy-based reconstruction of a protein backbone from its aplha-carbon trace by a Monte-Carlomethod," Journal of Computational Chemistry, Vol.23,pp.715-723,2002.
[25] S. S.Keerthi,"Efficient tuning of SVMhyper-parameters using radius/margin boundand iterative algorithms," IEEE TransactionsonNeuralNetworks, Vol.13(5),pp.1225-1229,2002.
[26] S. S.Keerthi,S.K.Shevade,C.Bhattacharyya,andK.R.K.Murthy,"Improvements to platt''s SMO algorithm for SVM classifier design," NeuralCom- putation, Vol.13,pp.637-649,2001.
[27] N. Krasnogor,W.E.Hart,J.Smith,andD.A.Pelta,"Protein structure prediction with evolutionary algorithms," Proceedings of the Genetic and Evolutionary Compution Conference, Orlando,USA,pp.1596-1601,1999.
[28] H. H.LinandL.Y.Tseng,"Prediction of disulfide bonding pattern based on support vector machine with parameters tuned by multiple trajectory search," WSEAS Transactions on Computers, Vol.8(9),pp.1429-1439,2009.
[29] P.T.Lin,S.F.Su,andT.T.Lee,"Support vector regression performance analysis and systematic parameter selection," Proceedings of International Joint Conference on Neural Networks, Montreal,Canada,pp.877-882,2005.
[30] S. W.Lin,K.C.Ying,S.C.Chen,andZ.J.Lee,"Particle swarm optimization for parameter determination and feature selection of support vector machines," Expert Systems with Applications, Vol.35(4),pp.1817-1824,2008.
[31] O. L.Mangasarian, Nonlinear programming. McGraw-Hill,NewYork,1969.
[32] J. Maupetit,R.Gautier,andP.Tufiery,"SABBAC:online structural alphabet based protein backbone reconstruction from alpha-carbon trace," Nucleic Acids Research, Vol.34,pp.W147-W151,2006.
[33] G. P.McCormick, Nonlinear Programming:Theory,Algorithms,andApplications. Wiley,NewYork,1983.
[34] B. L.Moore,L.A.Kelley,J.Barber,J.Murray,and J.T.MacDonald,"High- quality protein backbone reconstruction from alpha-carbons using Gaussian mixture models," Journal of Computational Chemistry, Vol.34,pp.1881-1889, 2013.
[35] J. Moult,K.Fidelis,A.Kryshtafovych,B.Rost,andA.Tramontano,"Critical assessment of methods of protein structure prediction(CASP)xRoundIX," Proteins, Vol.79,pp.1-5,2011.
[36] K. Pearson,"Mathematical Contributions to the Theory of Evolution.III.Regression, Heredity,andPanmixia," Transactions of RoyalSociety of London. Series A, Vol.187,pp.253-318,1896.
[37] J. RodgersandW.Nicewander,"Thirteen ways to look at the correlation coefficient," The American Statistician, Vol.42,pp.59-65,1988.
[38] I.Ruczinski,C.Kooperberg,R.Bonneau,and D.Baker,"Distribution of beta sheets in proteins with application to structure prediction," Proteins:Structure, Function,and Genetics, Vol.48,pp.85-97,2008.
[39] S. Santini,G.Wei,N.Mousseau,andP.Derreumaux,"Exploring the folding path ways of proteins through energy landscape sampling:Applicationto alzheimer''s beta-amyloidpeptide," Internet ElectronicJournalofMolecularDe- sign, Vol.2,No.9,pp.564-577,2003.
[40] B. Scholkopf,K.Tsuda,andJ.P.Vert, Kernel MethodsinComputational Biology. TheMITPress,2004.
[41] A. J.SmolaandB.Scholkopf,"A tutorial on support vector regression," Statistics and Computing, Vol.14,pp.199-222,2004.
[42] V. VapnikandA.Chervonenkis,"On the uniform convergence of relative frequencies of events to their probabilities," Theory of Probability and Its Applications, Vol.16(2),pp.264-280,1971.
[43] V. Vapnik,S.E.Golowich,andA.Smola,"Support vector method for function approximation,regression estimation,and signal processing," Advancesin Neural Information Processing Systems 9, pp.281-287,MITPress,1996.
[44] V. Vapnik, Estimation of Dependences Based on Empirical Data. Springer Series inStatistics:NewYork,1982.
[45] V. Vapnik, The Nature of Statistical Learning Theory. Springer,NewYork, 1995.
[46] V. Vapnik, Statistical Learning Theory. Wiley,NewYork,1998.
[47] J. H.Wang,C.B.Yang,andC.T.Tseng,"Reconstruction of Protein Backbone with the alpha-Carbon Coordinates," Journal of Information Scienceand Engineering, Vol.26,No.3,pp.1107-1119,2010.
[48] H. F.Wu,C.B.Yang,C.Y.Hor,Y.H.Peng,andK.T.Tseng,"Protein backbone reconstruction with tool preference classification for standard and non-standard proteins," Proceedings of the 12th Conferenceon Information Techology and Applications in OutlyingIslands, Kingmen,Taiwan,pp.175-182, 2013.
[49] H. W.Yen,C.B.Yang,andH.Y.Ann,"An effiective tool preference selection method for protein structure prediction with SVM," Proceedings of the 27th Workshop on Combinatorial Mathematics and Computation Theory, Taichung, Taiwan,pp.62-67,2010.
[50] H. C.Yuan,"A survey of computational methods for protein structure prediction," Master''s Thesis, National Sun Yat-sen University,Kaohsiung,Taiwan, July,2015. 53
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔