(3.237.97.64) 您好!臺灣時間:2021/03/03 07:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黎阮國慶
研究生(外文):Le Nguyen Quoc Khanh
論文名稱:使用 RBF 網路與深度學習技術來鑑別運輸蛋白的不同功能
論文名稱(外文):Using RBF networks and deep learning techniques to identify different functions of transport proteins
指導教授:歐昱言
指導教授(外文):Yu-Yen Ou
口試委員:林啟芳張經略吳曉光洪炯宗
口試委員(外文):Chi-Fang LinChing-Lueh ChangHsiao-Kuang WuJorng-Tzong Horng
口試日期:2018-01-09
學位類別:博士
校院名稱:元智大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:87
中文關鍵詞:machine learningdeep learningconvolutional neural networkradial basis function networktransport proteinsposition specific scoring matrixmembrane proteinsbinding site
外文關鍵詞:machine learningdeep learningconvolutional neural networkradial basis function networktransport proteinsposition specific scoring matrixmembrane proteinsbinding site
相關次數:
  • 被引用被引用:0
  • 點閱點閱:173
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
In several years, deep learning is a modern machine learning technique using in a variety of fields with state-of-the-art performance. Therefore, utilization of deep learning to enhance performance is also an important solution for current bioinformatics field. In this study, we attempt to use radial basis function networks and deep learning techniques to identify different functions of transport protein, which are the important molecular functions in transmembrane proteins. Transport proteins are proteins interacted in cell membrane to bind and carry atoms and small molecules within cells and throughout the body. There are many different kinds of transport proteins, they are critical to the growth and life of all living organisms. The electron transport proteins are transport proteins which have an important role in storing and transferring electrons in cellular respiration, which is the most proficient process through which cells gather energy from consumed food. According to the molecular functions, the electron transport chain components could be formed with five complexes with several different electron carriers and functions. Therefore, identifying and classifying the molecular functions of electron transport chain is vital for helping biologists understand the transport protein process and energy production in cells.
This work includes two phases for investigating electron transport proteins from transport proteins and predicting the binding sites in transport proteins. In the first phase, the performances from PSSM with AAIndex feature set were successful in identifying electron transport proteins in transport proteins with achieved sensitivity of 73.2%, specificity of 94.1%, and accuracy of 91.3%, with MCC of 0.64 for independent data set. Additionally, our method can approach a precise model for identifying of five complexes with different molecular functions in electron transport proteins. The PSSM with AAIndex properties in five complexes achieved MCC of 0.51, 0.47, 0.42, 0.74, and 1.00 for independent data set, respectively. Furthermore, our deep learning method can approach a precise model for identifying of electron transport proteins with achieved sensitivity of 80.3%, specificity of 94.4%, and accuracy of 92.3%, with MCC of 0.71 for independent data set. For the second phase, we successfully predict the FAD and GTP binding sites in transport protein with PSSM and significant amino acid pairs. Compared with the other published methods, our method can improve in all of the measurement metrics. The proposed technique can serve as a powerful tool for investigating the transport proteins and can help biologists understand the function of the transport proteins. Moreover, this study provides a basis for further research that can enrich a field of applying deep learning in bioinformatics.
In several years, deep learning is a modern machine learning technique using in a variety of fields with state-of-the-art performance. Therefore, utilization of deep learning to enhance performance is also an important solution for current bioinformatics field. In this study, we attempt to use radial basis function networks and deep learning techniques to identify different functions of transport protein, which are the important molecular functions in transmembrane proteins. Transport proteins are proteins interacted in cell membrane to bind and carry atoms and small molecules within cells and throughout the body. There are many different kinds of transport proteins, they are critical to the growth and life of all living organisms. The electron transport proteins are transport proteins which have an important role in storing and transferring electrons in cellular respiration, which is the most proficient process through which cells gather energy from consumed food. According to the molecular functions, the electron transport chain components could be formed with five complexes with several different electron carriers and functions. Therefore, identifying and classifying the molecular functions of electron transport chain is vital for helping biologists understand the transport protein process and energy production in cells.
This work includes two phases for investigating electron transport proteins from transport proteins and predicting the binding sites in transport proteins. In the first phase, the performances from PSSM with AAIndex feature set were successful in identifying electron transport proteins in transport proteins with achieved sensitivity of 73.2%, specificity of 94.1%, and accuracy of 91.3%, with MCC of 0.64 for independent data set. Additionally, our method can approach a precise model for identifying of five complexes with different molecular functions in electron transport proteins. The PSSM with AAIndex properties in five complexes achieved MCC of 0.51, 0.47, 0.42, 0.74, and 1.00 for independent data set, respectively. Furthermore, our deep learning method can approach a precise model for identifying of electron transport proteins with achieved sensitivity of 80.3%, specificity of 94.4%, and accuracy of 92.3%, with MCC of 0.71 for independent data set. For the second phase, we successfully predict the FAD and GTP binding sites in transport protein with PSSM and significant amino acid pairs. Compared with the other published methods, our method can improve in all of the measurement metrics. The proposed technique can serve as a powerful tool for investigating the transport proteins and can help biologists understand the function of the transport proteins. Moreover, this study provides a basis for further research that can enrich a field of applying deep learning in bioinformatics.
Abstract iii
Acknowledgements vi
Contents vii
List of Tables xii
List of Figures xv
Chapter 1、Introduction 1
1.1 Motivation 1
1.2 Research background 2
1.2.1 Machine learning and deep learning in bioinformatics 2
1.2.2 Electron transport protein 3
1.2.3 Five complexes in electron transport chain 4
1.2.4 Nucleotide binding sites in transport proteins 9
1.3 Scope of the study 11
1.4 Organization of the study 12
Chapter 2、Literature Review 14
2.1 Introduction 14
2.2 Related researches on deep learning 14
2.3 Related researches on electron transport protein 15
2.4 Related researches on nucleotide binding sites 16
2.5 The purpose of the study and motivation research 17
Chapter 3、Identifying Electron Transport Proteins using Radial Basis Function Networks and Biochemical Properties 19
3.1 Data collection 19
3.1.1 Electron transport protein and corresponding molecular function 19
3.1.2 Pre-processing data 22
3.2 Feature extractions 23
3.2.1 Composition of amino acids and amino acid pairs 23
3.2.2 Position specific scoring matrices 25
3.2.3 F-score 26
3.2.4 Biochemical properties 28
3.3 Design of the radial basis function networks 29
3.4 Effectiveness evaluation method 31
3.5 Results and discussions 32
3.5.1 Predictive performance for identifying electron transport proteins in transport proteins with different feature sets 32
3.5.2 Comparison of the performance for identifying electron transport proteins in transport proteins with different classifier 33
3.5.3 Predictive performance for identifying the category of electron transport proteins 34
3.5.4 The statistical analysis in electron transporters and transport proteins 34
3.5.5 The statistical analysis for identifying the category of electron transport proteins 38
3.5.6 Identification and classification of new electron transport proteins in transport proteins 42
Chapter 4、Identifying Electron Transport Proteins with Deep Learning 44
4.1 Data collection 44
4.2 Shallow artificial neural networks and deep neural networks 45
4.3 2D convolutional neural network structure 47
4.3.1 Input layers 47
4.3.2 Multiple layers generating for deep neural network 48
4.3.3 Output layer 50
4.4 Experimental results 51
4.4.1 Composition amino acid analysis 51
4.4.2 Performance of identifying electron transport proteins with various filter layers 51
4.4.3 Performance of identifying electron transport proteins with various optimizers 53
4.4.4 Improving performance and preventing overfitting of identifying electron transport proteins 54
4.4.5 Comparison between proposed method and other classifiers 54
4.4.6 Web server for identifying the electron transport proteins 55
4.4.7 Identification and classification of new electron transport proteins in transport proteins 55
Chapter 5、Prediction of Nucleotide Binding Sites in Transport Proteins 57
5.1 Data collection 58
5.2 Feature extractions 60
5.2.1 Sequence information 60
5.2.2 PAM250 60
5.2.3 BLOSUM62 61
5.2.4 Position specific scoring matrix profiles 62
5.2.5 Significant amino acid pairs 63
5.3 Results and discussions 65
5.3.1 Amino acid composition analysis 65
5.3.2 Performance in predicting FAD and GTP binding sites in transport proteins by using various window sizes 67
5.3.3 Performance in predicting FAD and GTP binding sites in transport proteins with different feature sets 71
5.3.4 Significance analysis based on the proposed method 72
5.3.5 Performance in predicting FAD and GTP binding sites in transport proteins with different classifiers 74
5.3.6 Leave-one-out analysis with six FAD binding proteins in electron transport chains 75
5.3.7 Comparison of the proposed method with another methods 76
5.3.8 Identification of new binding sites in transport proteins 77
5.3.9 Web server for predicting binding sites in electron transport protein 78
Chapter 6、Conclusions 80
6.1 Research contributions 80
6.2 Limitations and further study 81
References 82
1. Alipanahi, B., A. Delong, M. T. Weirauch and B. J. Frey (2015). "Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning." Nature Biotechnology 33(8): 831-838.
2. Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman (1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Research 25(17): 3389-3402.
3. Ashburner, M., C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight and J. T. Eppig (2000). "Gene Ontology: tool for the unification of biology." Nature Genetics 25(1): 25.
4. Bergstra, J., O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley and Y. Bengio (2010). Theano: A CPU and GPU math compiler in Python. Proc. 9th Python in Science Conf (pp. 1-7).
5. Bernstein, F. C., T. F. Koetzle, G. J. Williams, E. F. Meyer, M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi and M. Tasumi (1977). "The protein data bank." The FEBS Journal 80(2): 319-324.
6. Boeckmann, B., A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M. J. Martin, K. Michoud, C. O'donovan and I. Phan (2003). "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003." Nucleic Acids Research 31(1): 365-370.
7. Bradley, A. P. (1997). "The use of the area under the ROC curve in the evaluation of machine learning algorithms." Pattern Recognition 30(7): 1145-1159.
8. Chang, C. C. and C. J. Lin (2011). "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2(3): 27.
9. Chauhan, J. S., N. K. Mishra and G. P. Raghava (2010). "Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information." BMC Bioinformatics 11(1): 301.
10. Chen, K., M. J. Mizianty and L. Kurgan (2011). "Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors." Bioinformatics 28(3): 331-341.
11. Chen, S. A., T. Y. Lee and Y. Y. Ou (2010). "Incorporating significant amino acid pairs to identify O-linked glycosylation sites on transmembrane proteins and non-transmembrane proteins." BMC Bioinformatics 11(1): 536.
12. Chen, S. A., Y. Y. Ou, T. Y. Lee and M. M. Gromiha (2011). "Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties." Bioinformatics 27(15): 2062-2067.
13. Chen, Y. W. and C. J. Lin (2006). "Combining SVMs with various feature selection strategies." Feature Extraction: 315-324.
14. Chollet, F. (2015). Keras.
15. Apweiler, R., Bairoch, A., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., ... & Martin, M. J. (2004). UniProt: the universal protein knowledgebase. Nucleic Acids Research, 32(suppl_1), D115-D119.
16. Crooks, G. E., G. Hon, J.-M. Chandonia and S. E. Brenner (2004). "WebLogo: a sequence logo generator." Genome Research 14(6): 1188-1190.
17. Droppelmann, C. A., D. Campos-Melo, K. Volkening and M. J. Strong (2014). "The emerging role of guanine nucleotide exchange factors in ALS and other neurodegenerative diseases." Frontiers in Cellular Neuroscience, 8.
18. Fang, C., T. Noguchi and H. Yamana (2012). "Prediction of FAD binding residues with combined features from primary sequence." Int Proc Computer Sci Inf Technol 34: 47-153.
19. Ferro-Novick, S. and P. Novick (1993). "The role of GTP-binding proteins in transport along the exocytic pathway." Annual Review of Cell Biology 9(1): 575-599.
20. Frank, E., M. Hall, L. Trigg, G. Holmes and I. H. Witten (2004). "Data mining in bioinformatics using Weka." Bioinformatics 20(15): 2479-2481.
21. Graves, A., A.-r. Mohamed and G. Hinton (2013). Speech recognition with deep recurrent neural networks. Acoustics, speech and signal processing (ICASSP), 2013 IEEE International Conference on, IEEE.
22. Gromiha, M. M. (2010). Protein bioinformatics: from sequence to function, Academic Press.
23. Gromiha, M. M. and M. Suwa (2004). "A simple statistical method for discriminating outer membrane proteins with better accuracy." Bioinformatics 21(7): 961-968.
24. Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. H. Witten (2009). "The WEKA data mining software: an update." ACM SIGKDD explorations newsletter 11(1): 10-18.
25. Hanley, J. A. and B. J. McNeil (1982). "The meaning and use of the area under a receiver operating characteristic (ROC) curve." Radiology 143(1): 29-36.
26. Hinton, G., L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen and T. N. Sainath (2012). "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal Processing Magazine 29(6): 82-97.
27. Hu, J., X. He, D. J. Yu, X. B. Yang, J. Y. Yang and H. B. Shen (2014). "A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction." PloS One 9(9): e107676.
28. Hutagalung, A. H. and P. J. Novick (2011). "Role of Rab GTPases in membrane traffic and cell physiology." Physiological Reviews 91(1): 119-149.
29. Jones, D. T. (1999). "Protein secondary structure prediction based on position-specific scoring matrices." Journal of Molecular Biology 292(2): 195-202.
30. Kawashima, S. and M. Kanehisa (2000). "AAindex: amino acid index database." Nucleic Acids Research 28(1): 374-374.
31. Keller, J. M., M. R. Gray and J. A. Givens (1985). "A fuzzy k-nearest neighbor algorithm." IEEE Transactions on Systems, Man, and Cybernetics(4): 580-585.
32. Kingma, D. and J. Ba (2014). "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980.
33. Kohavi, R. (1996). Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. KDD.
34. Krizhevsky, A., I. Sutskever and G. E. Hinton (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems.
35. Lawrence, S., C. L. Giles, A. C. Tsoi and A. D. Back (1997). "Face recognition: A convolutional neural-network approach." IEEE Transactions on Neural Networks 8(1): 98-113.
36. Le, N. Q. K. and Ou, Y. Y. (2016). Prediction of FAD binding sites in electron transport proteins according to efficient radial basis function networks and significant amino acid pairs. BMC Bioinformatics, 17(1), 298.
37. Le, N. Q. K. and Ou, Y. Y. (2016). "Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins." BMC Bioinformatics 17(19): 183.
38. Le, N. Q. K., Nguyen, T. T. D. and Ou, Y. Y. (2017). "Identifying the molecular functions of electron transport proteins using radial basis function networks and biochemical properties." Journal of Molecular Graphics and Modelling 73(May 2017): Pages 166–178.
39. Le, N. Q. K., Ho, Q. T. and Ou, Y. Y. (2017). Incorporating deep learning with convolutional neural networks and position specific scoring matrices for identifying electron transport proteins. Journal of Computational Chemistry, 38(23), 2000-2006.
40. LeCun, Y., Y. Bengio and G. Hinton (2015). "Deep learning." Nature 521(7553): 436-444.
41. Li, A., L. Wang, Y. Shi, M. Wang, Z. Jiang and H. Feng (2006). Phosphorylation site prediction with a modified k-nearest neighbor algorithm and BLOSUM62 matrix. Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the, IEEE.
42. Liaw, A. and M. Wiener (2002). "Classification and regression by randomForest." R news 2(3): 18-22.
43. McCallum, A. and K. Nigam (1998). A comparison of event models for naive bayes text classification. AAAI-98 workshop on learning for text categorization, Madison, WI.
44. Mishra, N. K. and G. P. Raghava (2010). "Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information." BMC Bioinformatics 11(1): S48.
45. Ou, Y.-Y., M. M. Gromiha, S.-A. Chen and M. Suwa (2008). "TMBETADISC-RBF: discrimination of-barrel membrane proteins using RBF networks and PSSM profiles." Computational Biology and Chemistry 32(3): 227-231.
46. Ou, Y. (2005). "QuickRBF: a package for efficient radial basis function networks." QuickRBF software available at http://csie. org/~ yien/quickrbf.
47. Ou, Y. Y., S. A. Chen, Y. M. Chang, D. Velmurugan, K. Fukui and M. Michael Gromiha (2013). "Identification of efflux proteins using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties." Proteins: Structure, Function, and Bioinformatics 81(9): 1634-1643.
48. Ou, Y. Y., S. A. Chen and M. M. Gromiha (2010). "Classification of transporters using efficient radial basis function networks with position‐specific scoring matrices and biochemical properties." Proteins: Structure, Function, and Bioinformatics 78(7): 1789-1797.
49. Oyang, Y. J., S. C. Hwang, Y. Y. Ou, C. Y. Chen and Z. W. Chen (2005). "Data classification with radial basis function networks based on a novel kernel density estimation algorithm." IEEE Transactions on Neural Networks 16(1): 225-236.
50. Rose, P. W., B. Beran, C. Bi, W. F. Bluhm, D. Dimitropoulos, D. S. Goodsell, A. Prlić, M. Quesada, G. B. Quinn and J. D. Westbrook (2010). "The RCSB Protein Data Bank: redesigned web site and web services." Nucleic Acids Research 39(suppl_1): D392-D401.
51. Saier Jr, M. H., C. V. Tran and R. D. Barabote (2006). "TCDB: the Transporter Classification Database for membrane transport protein analyses and information." Nucleic Acids Research 34(suppl_1): D181-D186.
52. Schmidhuber, J. (2015). "Deep learning in neural networks: An overview." Neural Networks 61: 85-117.
53. Spencer, M., J. Eickholt and J. Cheng (2015). "A deep learning network approach to ab initio protein secondary structure prediction." IEEE/ACM Transactions on Computational Biology and Bioinformatics 12(1): 103-112.
54. Srivastava, N., G. E. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov (2014). "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15(1): 1929-1958.
55. Su, C. T., C. Y. Chen and Y. Y. Ou (2006). "Protein disorder prediction by condensed PSSM considering propensity for order or disorder." BMC Bioinformatics 7(1): 319.
56. Taju, S. W., N. Q. K. Le and Y. Y. Ou (2016). Using deep learning with position specific scoring matrices to identify efflux proteins in membrane and transport proteins. Bioinformatics and Bioengineering (BIBE), 2016 IEEE 16th International Conference on, IEEE.
57. Tieleman, T. and G. Hinton (2012). "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude." COURSERA: Neural Networks for Machine Learning 4(2): 26-31.
58. Xie, D., A. Li, M. Wang, Z. Fan and H. Feng (2005). "LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST." Nucleic Acids Research 33(suppl_2): W105-W110.
59. Yang, S. and A. G. Rosenwald (2014). "The roles of monomeric GTP-binding proteins in macroautophagy in Saccharomyces cerevisiae." International Journal of Molecular Sciences 15(10): 18084-18101.
60. Yang, Z. R. and R. Thomson (2005). "Bio-basis function neural network for prediction of protease cleavage sites in proteins." IEEE Transactions on Neural Networks 16(1): 263-274.
61. Zeiler, M. D. (2012). "ADADELTA: an adaptive learning rate method." arXiv preprint arXiv:1212.5701.
62. Zhang, G.-Z. and D.-S. Huang (2004). "Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme." Journal of Computer-aided Molecular Design 18(12): 797-810.
63. Zhang, M., L. Chen, S. Wang and T. Wang (2009). "Rab7: roles in membrane trafficking and disease." Bioscience Reports 29(3): 193-209.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔