|
[1]Stanley P. Golgi glycosylation. Cold Spring Harb Perspect Biol. 2011 Apr 1;3(4):a005199. doi: 10.1101/cshperspect.a005199. [2]Gavel Y, von Heijne G. Sequence differences between glycosylated and non-glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering. Protein Eng. 1990 Apr;3(5):433-42. doi: 10.1093/protein/3.5.433. [3]Kowarik M, Young NM, Numao S, Schulz BL, Hug I, Callewaert N, Mills DC, Watson DC, Hernandez M, Kelly JF, Wacker M, Aebi M. Definition of the bacterial N-glycosylation site consensus sequence. EMBO J. 2006 May 3;25(9):1957-66. doi: 10.1038/sj.emboj.7601087. [4]Thanka Christlet TH, Veluraja K. Database analysis of O-glycosylation sites in proteins. Biophys J. 2001 Feb;80(2):952-60. doi: 10.1016/s0006-3495(01)76074-2. [5]Krieg J, Hartmann S, Vicentini A, Gläsner W, Hess D, Hofsteenge J. Recognition signal for C-mannosylation of Trp-7 in RNase 2 consists of sequence Trp-x-x-Trp. Mol Biol Cell. 1998 Feb;9(2):301-9. doi: 10.1091/mbc.9.2.301. [6]Gupta SK, Shukla P. Glycosylation control technologies for recombinant therapeutic proteins. Appl Microbiol Biotechnol. 2018 Dec;102(24):10457-10468. doi: 10.1007/s00253-018-9430-6 [7]Hossler P. Protein glycosylation control in mammalian cell culture: past precedents and contemporary prospects. Adv Biochem Eng Biotechnol. 2012;127:187-219. doi: 10.1007/10_2011_113. [8]Jimenez del Val I, Nagy JM, Kontoravdi C. A dynamic mathematical model for monoclonal antibody N-linked glycosylation and nucleotide sugar donor transport within a maturing Golgi apparatus. Biotechnol Prog. 2011 Nov-Dec;27(6):1730-43. doi: 10.1002/btpr.688. [9]Kremkow BG, Lee KH. Glyco-Mapper: A Chinese hamster ovary (CHO) genome-specific glycosylation prediction tool. Metab Eng. 2018 May;47:134-142. doi: 10.1016/j.ymben.2018.03.002. [10] McDonald AG, Hayes JM, Bezak T, Głuchowska SA, Cosgrave EF, Struwe WB, Stroop CJ, Kok H, van de Laar T, Rudd PM, Tipton KF, Davey GP. Galactosyltransferase 4 is a major control point for glycan branching in N-linked glycosylation. J Cell Sci. 2014 Dec 1;127(Pt 23):5014-26. doi: 10.1242/jcs.151878. [11] Medlock GL, Papin JA. Guiding the Refinement of Biochemical Knowledgebases with Ensembles of Metabolic Networks and Machine Learning. Cell Syst. 2020 Jan 22;10(1):109-119.e3. doi: 10.1016/j.cels.2019.11.006 [12] Hahm YH, Hahm SH, Jo HY, Ahn YH. Comparative Glycopeptide Analysis for Protein Glycosylation by Liquid Chromatography and Tandem Mass Spectrometry: Variation in Glycosylation Patterns of Site-Directed Mutagenized Glycoprotein. Int J Anal Chem. 2018 Sep 2;2018:8605021. doi: 10.1155/2018/8605021. [13] Kotidis P, Kontoravdi C. Harnessing the potential of artificial neural networks for predicting protein glycosylation. Metab Eng Commun. 2020 May 15;10:e00131. doi: 10.1016/j.mec.2020.e00131. [14] Gupta R, Jung E, Brunak S. Prediction of N-glycosylation sites in human proteins. 2004. http://www.cbs.dtu.dk/services/NetNGlyc/ [15] Hamby SE, Hirst JD. Prediction of glycosylation sites using random forests. BMC Bioinformatics. 2008 Nov 27;9:500. doi: 10.1186/1471-2105-9-500. [16] Chauhan JS, Rao A, Raghava GP. In silico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences. PLoS One. 2013 Jun 28;8(6):e67008. doi: 10.1371/journal.pone.0067008. [17] Chauhan JS, Bhat AH, Raghava GP, Rao A. GlycoPP: a webserver for prediction of N- and O-glycosites in prokaryotic protein sequences. PLoS One. 2012;7(7):e40155. doi: 10.1371/journal.pone.0040155. [18] Naderi-Manesh H, Sadeghi M, Arab S, Moosavi Movahedi AA. Prediction of protein surface accessibility with information theory. Proteins. 2001 Mar 1;42(4):452-9. doi: 10.1002/1097-0134(20010301)42:4<452::aid-prot40>3.0.co;2-q. [19] Senger RS, Karim MN. Prediction of N-linked glycan branching patterns using artificial neural networks. Math Biosci. 2008 Jan;211(1):89-104. doi: 10.1016/j.mbs.2007.10.005. [20] Ho T. Random decision forests. in Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, Quebec, Canada, 1995 pp. 278. doi: 10.1109/ICDAR.1995.598994 [21] Pirooznia M, Deng Y. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data. BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S25. doi: 10.1186/1471-2105-7-S4-S25. [22] Wang S, and Yao X,. Diversity analysis on imbalanced data sets by using ensemble models. IEEE. 2009, pp. 324-331, doi: 10.1109/CIDM.2009.4938667. [23] Nguyen D, Stutz R, Schorr S, Lang S, Pfeffer S, Freeze HH, Förster F, Helms V, Dudek J, Zimmermann R. Proteomics reveals signal peptide features determining the client specificity in human TRAP-dependent ER protein import. Nat Commun. 2018 Sep 14;9(1):3765. doi: 10.1038/s41467-018-06188-z. [24] Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, von Heijne G, Nielsen H. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019 Apr;37(4):420-423. doi: 10.1038/s41587-019-0036-z. [25] Braga PL, Oliveira ALI, Ribeiro GHT and Meira SRL. Bagging Predictors for Estimation of Software Project Effort. IEEE. 2007, pp. 1595-1600, doi: 10.1109/IJCNN.2007.4371196. [26] Eibl G, Pfeiffer KP. How to Make AdaBoost.M1 Work for Weak Base Classifiers by Changing Only One Line of the Code. Machine Learning: ECML. 2002 pp. 72–83 doi: 10.1007/3-540-36755-1_7. [27] Chen T and Guestrin C. XGBoost: A Scalable Tree Boosting Systemin. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016 pp. 785–794 doi: 10.1145/2939672.2939785. [28] Lee TY, Huang HD, Hung JH, Huang HY, Yang YS, Wang TH. dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D622-7. doi: 10.1093/nar/gkj083 [29] Lu CT, Huang KY, Su MG, Lee TY, Bretaña NA, Chang WC, Chen YJ, Chen YJ, Huang HD. DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res. 2013 Jan;41(Database issue):D295-305. doi: 10.1093/nar/gks1229. [30] Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH, Huang HD, Lee TY. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res. 2016 Jan 4;44(D1):D435-46. doi: 10.1093/nar/gkv1240. [31] Gupta R, Brunak S (2002) Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput: 310–322. [32]The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099.
[33] Boughorbel S, Jarray F, El-Anbari M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS One. 2017 Jun 2;12(6):e0177678. doi: 10.1371/journal.pone.0177678. [34] Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095. [35] Bhat AH, Mondal H, Chauhan JS, Raghava GP, Methi A, Rao A. ProGlycProt: a repository of experimentally characterized prokaryotic glycoproteins. Nucleic Acids Res. 2012 Jan;40(Database issue):D388-93. doi: 10.1093/nar/gkr911. [36] Choudhary P, Nagar R, Singh V, Bhat AH, Sharma Y, Rao A. ProGlycProt V2.0, a repository of experimentally validated glycoproteins and protein glycosyltransferases of prokaryotes. Glycobiology. 2019 Jun 1;29(6):461-468. doi: 10.1093/glycob/cwz013. [37] Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, Poux S, Bougueleret L, Xenarios I. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2. [38] Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007 May 15;23(10):1282-8. doi: 10.1093/bioinformatics/btm098. [39] Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R. UniProt archive. Bioinformatics. 2004 Nov 22;20(17):3236-7. doi: 10.1093/bioinformatics/bth191. [40] Chen Z, Zhao P, Li F, Marquez-Lago TT, Leier A, Revote J, Zhu Y, Powell DR, Akutsu T, Webb GI, Chou KC, Smith AI, Daly RJ, Li J, Song J. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041. [41] Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004 Jun;14(6):1188-90. doi: 10.1101/gr.849004. [42] Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. [43] Klausen MS, Jespersen MC, Nielsen H, Jensen KK, Jurtz VI, Sønderby CK, Sommer MOA, Winther O, Nielsen M, Petersen B, Marcatili P. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins. 2019 Jun;87(6):520-527. doi: 10.1002/prot.25674. [44] Pancsa R, Tompa P. Structural disorder in eukaryotes. PLoS One. 2012;7(4):e34687. doi: 10.1371/journal.pone.0034687 [45] Wood MJ, Hirst JD. Protein secondary structure prediction with dihedral angles. Proteins. 2005 May 15;59(3):476-81. doi: 10.1002/prot.20435. [46] Liu W, Wang Z, Liu X, Zeng N, Liu Y, F.E. Alsaadi A survey of deep neural network architectures and their applications Neurocomputing, 234 (2017), pp. 11-26 [47] Izard JW, Kendall DA. Signal peptides: exquisitely designed transport promoters. Mol Microbiol. 1994 Sep;13(5):765-73. doi: 10.1111/j.1365-2958.1994.tb00469.x. [48] Holmes G, Donkin A and H I. WEKA: a machine learning workbench. Proceedings of ANZIIS ’94 - Australian New Zealnd Intelligent Information Systems Conference Nov. 1994 pp. 357–361 doi: 10.1109/ANZIIS.1994.396988.
[49] Breiman L. Bagging predictors”. Mach. Learn. vol. 24 no. 2 pp. 123–140 Aug. 1996 doi: 10.1007/BF00058655. [50] OzaNC. Online bagging and boosting.IEEE 2005 vol. 3 pp. 2340-2345 Vol. 3 doi: 10.1109/ICSMC.2005.1571498. [51] Jerry Ye, Jyh-Herng Chow, Jiang Chen, and Zhaohui Zheng. 2009. Stochastic gradient boosted distributed decision trees. 18th ACM conference on Information and knowledge management. 2009 pp. 2061–2064 doi: 10.1145/1645953.1646301. [52] Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013 Dec 4;7:21. doi: 10.3389/fnbot.2013.00021. [53] Razi MA and Athappilly K. A comparative predictive analysis of neural networks (NNs) nonlinear regression and classification and regression tree (CART) models. Expert Syst. Appl. vol. 29 no. 1 pp. 65–74 Jul. 2005 doi: 10.1016/j.eswa.2005.01.006. [54] Kawashima S, Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 2000 Jan 1;28(1):374. doi: 10.1093/nar/28.1.374. [55] Cedano J, Aloy P, Pérez-Pons JA, Querol E. Relation between amino acid composition and cellular location of proteins. J Mol Biol. 1997 Feb 28;266(3):594-600. doi: 10.1006/jmbi.1996.0804. [56] Chen YZ, Tang YR, Sheng ZY, Zhang Z. Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinformatics. 2008 Feb 18;9:101. doi: 10.1186/1471-2105-9-10. [57] Liu B, Wang X, Lin L, Dong Q, Wang X. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis. BMC Bioinformatics. 2008 Dec 1;9:510. doi: 10.1186/1471-2105-9-510
|