(3.238.235.155) 您好!臺灣時間:2021/05/16 08:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王智炫
研究生(外文):Chih-Hsuan Wang
論文名稱:透過特徵選擇與模組化建置預測蛋白質穩定性的整合系統
論文名稱(外文):An integrated server for predicting protein stability changes by feature selection and various modules
指導教授:朱彥煒朱彥煒引用關係
指導教授(外文):Yen-Wei Chu
口試委員:謝立青董其樺
口試委員(外文):Li-Ching HsiehChi-Hua Tung
口試日期:2016-07-25
學位類別:碩士
校院名稱:國立中興大學
系所名稱:基因體暨生物資訊學研究所
學門:生命科學學門
學類:生物訊息學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:54
中文關鍵詞:蛋白質穩定性胺基酸單點突變機器學習
外文關鍵詞:Protein stability changeSingle point mutationMachine learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:84
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
蛋白質的結構對於蛋白質功能而言具有高度的相關性,當胺基酸殘基上發生單點突變,可能會對整個蛋白質結構造成嚴重的改變,因此導致功能中斷。胺基酸單點突變藉由折疊後造成較小的自由能變化 (ΔG、dG) 也可以改變蛋白質結構的穩定性,在一般蛋白質與突變蛋白質之間不同折疊的自由能變化 (ΔΔG、ddG) 常常被作為蛋白質穩定性改變的影響因素。目前存在許多蛋白質穩定性預測工具,但並非所有的預測結果都相同,當使用者做決定時往往會產生疑慮,因此本研究藉由整合目前工具的預測結果加入蛋白質序列特性作為特徵進行編碼,透過特徵選擇及機器學習方法來增加預測準確度,並提出一套供使用者輸入突變點位、溫度、ph值、蛋白質序列或結構資訊的整合性預測系統。本研究系統中包含了三種模組分別為網站模組 (Website Module)、序列模組 (Sequence Module)、單機模組 (Stand-alone Module),為了避免因其他線上系統無法運作時影響預測結果,以系統中的單機模組來維持一定的準確度。在結果方面,iStable2.0準確度可達MCC (Matthews correlation coefficient) 0.70,ΔΔG預測方面可達PCC (Pearson correlation coefficient) 0.81比現有的預測工具表現來得突出,並且在許多腫瘤中發現的抑制蛋白p53資料集中也有著一定的準確度。

A single mutation on the amino acid residue may cause a severe change in the whole protein structure and thus, lead to disruption of function. iStable is an integrated predictor, that provides the result not only in structural model but also in the model of sequence on prediction of protein stability changes now. iStable2.0 in order to increase the stability of predictive system, it adds in stand-alone and sequence modules. iStable2.0 used feature selection algorithms to increase the accuracy of prediction and reduce the time to execute. iStable2.0 build the graphical user interface, by using the information of result of prediction、ddG value、secondary structure and solvent accessibility, it can observe the variety about the information in better effect. iStable 2.0 can achieve up to accuracy of 70% and PCC (Pearson correlation coefficient) 0.81 on regression model, that performs the best accuracy on off-the-shelf predictors. iStable2.0 performed in PCC (Pearson correlation coefficient) 0.577 also on a newly generated dataset consisting of a number of variations occurring in the tumor suppressor protein p53. An online web server is freely available at http://predictor.nchu.edu.tw/istable2.

誌謝 i
摘要 ii
Abstract iii
Content iv
Content of Figures vi
Content of Tables vii
1 Introduction 1
1.1 Background 1
1.2 Motivation 2
2 Related Works 4
2.1 Cross-validation 4
2.2 NetSurfp 5
2.3 Support Vector Machine 5
2.4 Ealuation classification 6
2.5 Weka 8
3 Materials and Methods 9
3.1 Dataset 9
3.2 Feature encoding 10
3.2.1 Sequence based features 10
3.2.1.1 Binary 10
3.2.1.2 Physicochemical and biochemical properties 11
3.2.2 Structure based features 13
3.2.2.1 Relative/Absolute surface accessibility, RSA/ASA 13
3.2.2.2 Secondary structure, SS 14
3.2.3 Website result features 15
3.3 Input module 16
3.3.1 Website module (WM) 16
3.3.2 Stand-alone module (SAM) 16
3.3.3 Sequence module (SM) 16
3.4 Feature selection 17
3.5 Learning model construction 17
4 Result and Discussion 19
4.1 Comparison of machine learning algorithm 19
4.2 Comparison of prediction result 19
4.2.1 Performance of classification model 19
4.2.2 Performance of regression model 24
4.3 Structural analysis of predictors'' performances 28
4.4 Performance with different experimental conditions 36
4.5 Case study 37
4.6 Web server 40
4.7 Perspectives 45
5 Conclusion 47
6 Reference 49
7 Supplementary Materials 52


1.Wainreb, G., et al., Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. Bioinformatics, 2011. 27(23): p. 3286-3292.
2.Teng, S., A.K. Srivastava, and L. Wang, Sequence feature-based prediction of protein stability changes upon amino acid substitutions. BMC genomics, 2010. 11(2): p. 1.
3.Huang, L.-T., M.M. Gromiha, and S.-Y. Ho, iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations. Bioinformatics, 2007. 23(10): p. 1292-1293.
4.Cho, M.-K., et al., Amino acid bulkiness defines the local conformations and dynamics of natively unfolded α-synuclein and tau. Journal of the American Chemical Society, 2007. 129(11): p. 3032-3033.
5.Wang, L. and S.J. Brown, Prediction of DNA-binding residues from sequence features. Journal of bioinformatics and computational biology, 2006. 4(06): p. 1141-1158.
6.White, S.H., Amino acid preferences of small proteins: Implications for protein stability and evolution. Journal of molecular biology, 1992. 227(4): p. 991-995.
7.Benedix, A., et al., Predicting free energy changes using structural ensembles. Nat Methods, 2009. 6(1): p. 3-4.
8.Zhou, H. and Y. Zhou, Quantifying the effect of burial of amino acid residues on protein stability. PROTEINS: Structure, Function, and Bioinformatics, 2004. 54(2): p. 315-322.
9.Takano, K. and K. Yutani, A new scale for side-chain contribution to protein stability based on the empirical stability analysis of mutant proteins. Protein engineering, 2001. 14(8): p. 525-528.
10.Deleage, G. and B. Roux, An algorithm for protein secondary structure prediction based on class prediction. Protein engineering, 1987. 1(4): p. 289-294.
11.Dehouck, Y., et al., Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics, 2009. 25(19): p. 2537-2543.
12.Bromberg, Y. and B. Rost, Correlating protein function and stability through the analysis of single amino acid substitutions. BMC Bioinformatics, 2009. 10 Suppl 8: p. S8.
13.Capriotti, E., P. Fariselli, and R. Casadio, A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics, 2004. 20(suppl 1): p. i63-i68.
14.Chang, C.-C. and C.-J. Lin, LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2011. 2(3): p. 27.
15.Cheng, J., A. Randall, and P. Baldi, Prediction of protein stability changes for single‐site mutations using support vector machines. Proteins: Structure, Function, and Bioinformatics, 2006. 62(4): p. 1125-1132.
16.Rose, G.D., et al., Hydrophobicity of amino acid residues in globular proteins. Science, 1985. 229(4716): p. 834-838.
17.Pires, D.E., D.B. Ascher, and T.L. Blundell, DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res, 2014. 42(Web Server issue): p. W314-319.
18.Chen, C.W., J. Lin, and Y.W. Chu, iStable: off-the-shelf predictor integration for predicting protein stability changes. BMC Bioinformatics, 2013. 14 Suppl 2: p. S5.
19.Kawashima, S., H. Ogata, and M. Kanehisa, AAindex: Amino Acid Index Database. Nucleic Acids Res, 1999. 27(1): p. 368-369.
20.Savojardo, C., et al., INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics, 2016: p. 192.
21.Fariselli, P., et al., INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics, 2015. 31(17): p. 2816-2821.
22.Petersen, B., et al., A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol, 2009. 9: p. 51.
23.Frank, E., et al., Data mining in bioinformatics using Weka. Bioinformatics, 2004. 20(15): p. 2479-2481.
24.Bava, K.A., et al., ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic acids research, 2004. 32(suppl 1): p. D120-D121.
25.Pires, D.E., D.B. Ascher, and T.L. Blundell, mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 2014. 30(3): p. 335-342.
26.Atchley, W.R., et al., Solving the protein sequence metric problem. Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(18): p. 6395-6400.
27.Venkatarajan, M.S. and W. Braun, New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Molecular modeling annual, 2001. 7(12): p. 445-453.
28.Worth, C.L., R. Preissner, and T.L. Blundell, SDM—a server for predicting effects of mutations on protein stability and malfunction. Nucleic acids research, 2011. 39(suppl 2): p. W215-W222.
29.Parthiban, V., M.M. Gromiha, and D. Schomburg, CUPSAT: prediction of protein stability upon point mutations. Nucleic acids research, 2006. 34(suppl 2): p. W239-W242.
30.Capriotti, E., P. Fariselli, and R. Casadio, I-Mutant2. 0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic acids research, 2005. 33(suppl 2): p. W306-W310.
31.Masso, M. and I.I. Vaisman, AUTO-MUTE: web-based tools for predicting stability changes in proteins due to single amino acid replacements. Protein Engineering Design and Selection, 2010. 23(8): p. 683-687.
32.Berman, H.M., et al., The protein data bank. Nucleic acids research, 2000. 28(1): p. 235-242.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top