(3.237.97.64) 您好!臺灣時間:2021/03/04 12:15
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:蕭遠平
研究生(外文):Yuan-Ping Hsiao
論文名稱:蛋白質單點突變穩定性預測-機器學習方法與特徵值的選擇
論文名稱(外文):Prediction on the protein stability change after single-point mutation- selection of machine learning algorithms and features
指導教授:劉俊宏劉俊宏引用關係
口試委員:鄒文雄朱彥煒
口試日期:2017-07-24
學位類別:碩士
校院名稱:國立中興大學
系所名稱:基因體暨生物資訊學研究所
學門:生命科學學門
學類:生物訊息學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:75
中文關鍵詞:蛋白質穩定性機器學習法蛋白質突變
外文關鍵詞:protein stabilitymachine learningmutations
相關次數:
  • 被引用被引用:0
  • 點閱點閱:192
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
現在已有許多預測軟體,能評估蛋白質單點突變引起的穩定性變化,做為相關實驗的參考資訊。機器學習是一種常見的方法,使用不同演算法與蛋白質特徵屬性建構預測模型。本研究中交叉測試多種機器學習方法、特徵選擇與不同方法所產生的蛋白質特徵,以探討不同組合下所建構的預測模型的預測性能。我們使用同源建模程式Modeller模擬蛋白質突變型結構,再從蛋白質野生型結構與模擬出的突變型結構中提取結構性特徵。實驗的結果顯示,三種序列性特徵(PSSM, HMMER, HMM)與兩種分別基於周圍接觸原子和溶劑可接處面積的結構性特徵具有最好的預測相關性;而機器學習方法以Random Forest具有最佳的預測性能。最終所產生的預測模型在測試集下的預測表現為CC 0.670、MAE 0.924、RMSE 1.216,雖然與其他預測工具相比,並沒有最好的預測表現,但這項研究有助於未來相關研究中,作為機器學習方法與特徵選擇的參考。
Several computational methods have been developed to predict the effect of point mutation on the protein stability, provided for related research reference. Machine learning is a common method for predictive protein stability by using different algorithms and protein feature. In this study, we tested several machine learning methods, feature selection methods and protein features to explore the prediction performance under different combinations. We used the homologous modeling program Modeller to model mutant type structure, and then extract the structural feature from the protein wild type and mutant type protein structure. The results show that three sequence features (PSSM, HMMER, HMM) and two structure features (PDB contact atom, POPS) have highly correlated with predictive ability on protein features, and Random Forest has the best predictive performance on machine learning methods. In the blind test, final research model reach CC 0.670, MAE 0.924, RMSE 1.216. Our model compared with other tools is not the best predictive tool, but this study can help future research to select machine learning method and protein features.
誌謝 i
摘要 ii
Abstract iii
目錄 iv
表目錄 vi
圖目錄 vii
附錄目錄 viii
第一章 引言 1
1.1 研究背景及動機 1
1.2 現今的預測工具 1
1.3 蛋白質突變實驗資料庫Protherm 4
1.4 Weka 5
1.5 Random Forest 5
1.6 本研究目的與結果 6
第二章 研究方法與工具 7
2.1 蛋白質單點突變資料集 7
2.2 蛋白質特徵 7
2.2.1 Modeller特徵 7
2.2.2 Gromacs特徵 8
2.2.3 FoldX特徵 8
2.2.4 POPS特徵 8
2.2.5 DSSP特徵 9
2.2.6 LIGPLOT特徵 9
2.2.7 PDB contact atom特徵 9
2.2.8 PSSM特徵 10
2.2.9 HMM特徵 10
2.2.10 HHM特徵 11
2.3 機器學習演算法 11
2.4 特徵選擇 12
2.5 特徵數據處理 13
2.6 預測模型評估方式 13
2.7 實驗策略 14
2.7.1 策略一 14
2.7.2 策略二 14
第三章 實驗結果與討論 15
3.1 特徵合併與機器學習法 15
3.2 特徵縮減 16
3.3 數據處理 17
3.4 策略二-對個別種類特徵進行特徵縮減 17
3.5 預測模型的準確度並與目前既有的預測工具比較 18
3.6 不同條件下準確度的變化 19
第四章 結論 22
圖表 23
參考文獻 44
附錄 47
1.Parthiban, V., M.M. Gromiha, and D. Schomburg, CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Research, 2006. 34: p. W239-W242.
2.Deutsch, C. and B. Krishnamoorthy, Four-body scoring function for mutagenesis. Bioinformatics, 2007. 23(22): p. 3009-3015.
3.Zhou, H.Y. and Y.Q. Zhou, Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Science, 2002. 11(11): p. 2714-2726.
4.Worth, C.L., R. Preissner, and T.L. Blundell, SDM-a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Research, 2011. 39: p. W215-W222.
5.Guerois, R., J.E. Nielsen, and L. Serrano, Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations. Journal of Molecular Biology, 2002. 320(2): p. 369-387.
6.Capriotti, E., P. Fariselli, and R. Casadio, I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Research, 2005. 33: p. W306-W310.
7.Cheng, J.L., A. Randall, and P. Baldi, Prediction of protein stability changes for single-site mutations using support vector machines. Proteins-Structure Function and Bioinformatics, 2006. 62(4): p. 1125-1132.
8.Teng, S.L., A.K. Srivastava, and L.J. Wang, Sequence feature-based prediction of protein stability changes upon amino acid substitutions. Bmc Genomics, 2010. 11: p. 8.
9.Giollo, M., et al., NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation. Bmc Genomics, 2014. 15.
10.Fariselli, P., et al., INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics, 2015. 31(17): p. 2816-2821.
11.Huang, L.T., M.M. Gromiha, and S.Y. Ho, iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations. Bioinformatics, 2007. 23(10): p. 1292-1293.
12.Yin, S.Y., F. Ding, and N.V. Dokholyan, Eris: an automated estimator of protein stability. Nature Methods, 2007. 4(6): p. 466-467.
13.Pires, D.E.V., D.B. Ascher, and T.L. Blundell, mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 2013. 30(3): p. 335-342.
14.Chen, C.W., J. Lin, and Y.W. Chu, iStable: off-the-shelf predictor integration for predicting protein stability changes. Bmc Bioinformatics, 2013. 14: p. 14.
15.Pires, D.E.V., D.B. Ascher, and T.L. Blundell, DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Research, 2014. 42(W1): p. W314-W319.
16.Folkman, L., et al., EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. Journal of Molecular Biology, 2016. 428(6): p. 1394-1405.
17.Dehouck, Y., et al., Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics, 2009. 25(19): p. 2537-2543.
18.Masso, M. and I.I. Vaisman, Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis. Bioinformatics, 2008. 24(18): p. 2002-2009.
19.Laimer, J., et al., MAESTRO - multi agent stability prediction upon point mutations. Bmc Bioinformatics, 2015. 16: p. 13.
20.Gromiha, M.M., et al., ProTherm: Thermodynamic database for proteins and mutants. Nucleic Acids Research, 1999. 27(1): p. 286-288.
21.Frank, E., et al., Data mining in bioinformatics using Weka. Bioinformatics, 2004. 20(15): p. 2479-2481.
22.Breiman, L., Random forests. Machine Learning, 2001. 45(1): p. 5-32.
23.Ho, T.K., Random decision forests, in Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1. 1995, IEEE Computer Society. p. 278.
24.Breiman, L., Bagging Predictors. Machine Learning, 1996. 24(2): p. 123-140.
25.Ho, T.K., The random subspace method for constructing decision forests. Ieee Transactions on Pattern Analysis and Machine Intelligence, 1998. 20(8): p. 832-844.
26.Fiser, A. and A. Sali, MODELLER: Generation and refinement of homology-based protein structure models. Macromolecular Crystallography, Pt D, 2003. 374: p. 461-491.
27.Berendsen, H.J.C., D. van der Spoel, and R. van Drunen, GROMACS: A message-passing parallel molecular dynamics implementation. Computer Physics Communications, 1995. 91(1): p. 43-56.
28.Schymkowitz, J., et al., The FoldX web server: an online force field. Nucleic Acids Research, 2005. 33: p. W382-W388.
29.Cavallo, L., J. Kleinjung, and F. Fraternali, POPS: a fast algorithm for solvent accessible surface areas at atomic and residue level. Nucleic Acids Research, 2003. 31(13): p. 3364-3366.
30.Kabsch, W. and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12): p. 2577-637.
31.Wallace, A.C., R.A. Laskowski, and J.M. Thornton, LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions. Protein Eng, 1995. 8(2): p. 127-34.
32.Stormo, G.D., et al., Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research, 1982. 10(9): p. 2997-3011.
33.Eddy, S.R., Profile hidden Markov models. Bioinformatics, 1998. 14(9): p. 755-63.
34.Soding, J., Protein homology detection by HMM-HMM comparison. Bioinformatics, 2005. 21(7): p. 951-60.
35.Quan, L., Q. Lv, and Y. Zhang, STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics, 2016. 32(19): p. 2936-2946.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔