跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.134) 您好!臺灣時間:2025/12/19 18:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:高玉真
研究生(外文):NGOC TRINH CAO
論文名稱:以整合性機器學習法提升過敏原蛋白質之辨識
論文名稱(外文):Improving Allergenic Protein Identification using Integrated Machine Learned Approach
指導教授:游景盛
指導教授(外文):CHIN-SHENG YU
口試委員:陳奕中陳玉菁游景盛
口試委員(外文):YI-CHUNG CHENYU-CHING CHENCHIN-SHENG YU
口試日期:2018-01-22
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:48
中文關鍵詞:預測過敏過敏源蛋白機器學習支持向量機遺傳演算法
外文關鍵詞:Predictionallergyallergenic proteinmachine learningsupport vector machinegenetic algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:241
  • 評分評分:
  • 下載下載:4
  • 收藏至我的研究室書目清單書目收藏:0
過敏是身體暴露於某些物質時產生的異常反應,通過產生免疫球蛋白E抗體而導致免疫系統過度反應,其症狀包括發燒,食物過敏,異位性皮膚炎以及過敏反應。近年來,過敏原反應的數量顯著增加,這主要取決於基因改造食品的開發,治療和生物製藥以及過敏原蛋白(過敏原)的預測成為一個迫切的問題。雖然過去有很多方法正在開發用於過敏蛋白質的預測,但是仍然需要越來越精確和復雜的方法,特別是在區分過敏原和相似過敏原的非過敏原方面。為了解決這個問題,我們開發了一種將支持向量機(SVM)和遺傳演算法(GA)相結合的新方法,對n肽組成特性進行改進,提高了過敏原蛋白鑑定的準確性。在我們的實驗中,不僅提高了預測效能,而且獲得了哪些蛋白質性質具有過敏資訊的知識。雖然敏感度(sensitive value)略低於以前的方法(78.4%),但在1384個蛋白質序列的獨立資料集上應用時,我們的結果顯示準確度為96.6%,Matthew相關係數MCC為0.805,精確度為87%。我們的方法可以非常好的預測相似過敏原的非過敏原序列,並且它是近年來過敏原蛋白質及非過敏原蛋白質預測的最佳工具。
Allergy is the abnormal reaction of the body when exposed to some substances that cause the immune system overreacts by producing Immunoglobulin E antibodies and following upon symptoms include fever, allergy with food, atopic dermatitis as well as anaphylaxis. In recent years, the number of allergenic reactions increases significantly depending much on the developing of modified proteins in foods, therapeutics and biopharmaceuticals and allergenic protein (allergen) prediction becomes an urgent problem. Although, there are many previous methods developing for allergy protein prediction, more and more precise and sophisticated method is still necessary, especially on discriminating of allergens and allergen-like non-allergens. To overcome this problem, we developed a new approach which combines Support Vector Machine (SVM) and Genetic Algorithm (GA) processing on the n-peptide composition properties characteristic to improve the accuracy performance of allergen protein identification. In our experiment, we not only improved the prediction ability but also obtained the knowledge of which protein properties store the allergenic information. Although the sensitive value is slightly lower than previous method (78.4%), our results emphasize the accuracy performance 96.6%, Matthew’s correlation coefficient MCC 0.805 and precision 87% when applying on the independent dataset of 1384 protein sequences. Our approach predicts very well allergen-like non allergen sequences and it becomes a promised tool for allergen non-allergen protein prediction in recent.
Table of Contents
Acknowledgement i
Abstract ii
Table of Contents iii
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 3
Chapter 2 Related Work 4
2.1 Similarity-based method 4
2.1.1 Definition of similarity-based method 4
2.1.2 The model of similarity-based method 4
2.2 Feature extraction-based method 5
2.2.1 The model of feature extraction-based method 5
2.2.2 Some tools of feature extraction-based method 5
Chapter 3 Feature extraction 6
3.1 Construct the sequence database of allergens and non-allergens 6
3.2 Extract characteristic parameters 7
3.2.1 The Composition 7
3.2.2 The g- gap dipeptide composition 8
3.2.3 The partitioned amino acid composition 9
3.2.4 The local amino acid composition 10
3.2.5 The natural chemical categories 10
Chapter 4 Proposed method 12
4.1 The proposed model 12
4.2 Support Vector Machine 13
4.3 Genetic Algorithm 16
Chapter 5 Experimental Results 22
5.1 Performance evaluation 22
5.1.1 The confusion matrix 22
5.1.2 Classifier evaluation 23
5.2 Performance results 24
5.3 Results Analysis 26
5.3.1 Performance on some different datasets 26
5.3.2 Comparison with existing methods 26
5.3.3 Performance of prediction on some species 28
5.3.4 Comparison with the similarity-based method 32
5.3.4 Performance of SVMGA on feature selection 37
Chapter 6 Concluding Remarks 39
References 40


References
[1]C. A. Janeway Jr, P. Travers, M. Walport, and M. J. Shlomchik, "The complement system and innate immunity," 2001.
[2]C. Xu, "Nothing to sneeze at: allergenicity of GMOs," Science in the News, 2016.
[3]M. B. Stadler and B. M. Stadler, "Allergenicity prediction by protein sequence," The FASEB Journal, vol. 17, no. 9, pp. 1141-1143, 2003.
[4]R. E. Hileman et al., "Bioinformatic methods for allergenicity assessment using a comprehensive allergen database," International archives of allergy and immunology, vol. 128, no. 4, pp. 280-291, 2002.
[5]H. X. Dang and C. B. Lawrence, "Allerdictor: fast allergen prediction using text classification techniques," Bioinformatics, vol. 30, no. 8, pp. 1120-1128, 2014.
[6]H. C. Muh, J. C. Tong, and M. T. Tammi, "AllerHunter: A SVM-Pairwise System for Assessment of Allergenicity and Allergic Cross-Reactivity in Proteins," PLoS ONE, vol. 4, no. 6, p. e5861, 2009.
[7]S. S. Negi and W. Braun, "Cross-React: a new structural bioinformatics method for predicting allergen cross-reactivity," Bioinformatics, vol. 33, no. 7, pp. 1014-1020, 2016.
[8]C. Cai, L. Han, Z. L. Ji, X. Chen, and Y. Z. Chen, "SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence," Nucleic Acids Research, vol. 31, no. 13, pp. 3692-3697, 2003.
[9]J. Wang, Y. Yu, Y. Zhao, D. Zhang, and J. Li, "Evaluation and integration of existing methods for computational prediction of allergens," BMC bioinformatics, vol. 14, no. 4, p. S1, 2013.
[10]L. Zhang, Y. Huang, Z. Zou, Y. He, X. Chen, and A. Tao, "SORTALLER: predicting allergens using substantially optimized algorithm on allergen family featured peptides," Bioinformatics, vol. 28, no. 16, pp. 2178-2179, 2012.
[11]W. FAO, "Evaluation of allergenicity of genetically modified foods: report of a joint FAO/WHO expert consultation on allergenicity of foods derived from biotechnology," FAO, Rome, 2001.
[12]F. Joint and F. W. C. A. Commission, "Report of the Fifth Session of the Codex Ad Hoc Intergovernmental Task Force on Foods Derived from Biotechnology, Chiba, Japan, 19-23 September 2005," 2005.
[13]S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, "Basic local alignment search tool," Journal of molecular biology, vol. 215, no. 3, pp. 403-410, 1990.
[14]S. F. Altschul et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic acids research, vol. 25, no. 17, pp. 3389-3402, 1997.
[15]W. R. Pearson and D. J. Lipman, "Improved tools for biological sequence comparison," Proceedings of the National Academy of Sciences, vol. 85, no. 8, pp. 2444-2448, 1988.
[16]W. R. Pearson, "[5] Rapid and sensitive sequence comparison with FASTP and FASTA," 1990.
[17]A. Tao and E. Raz, Allergy bioinformatics. Springer, 2015.
[18]S. Saha and G. Raghava, "AlgPred: prediction of allergenic proteins and mapping of IgE epitopes," Nucleic Acids Research, vol. 34, no. suppl_2, pp. W202-W209, 2006.
[19]C. S. Yu, Y. C. Chen, C. H. Lu, and J. K. Hwang, "Prediction of protein subcellular localization," Proteins: Structure, Function, and Bioinformatics, vol. 64, no. 3, pp. 643-651, 2006.
[20]J. Cui et al., "Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties," Molecular Immunology, vol. 44, no. 4, pp. 514-520, 2007.
[21]S. Lessmann, M.-C. Sung, and J. E. Johnson, "Identifying winners of competitive events: A SVM-based classification model for horserace prediction," European Journal of Operational Research, vol. 196, no. 2, pp. 569-577, 2009.
[22]Q. Wu and D.-X. Zhou, "SVM soft margin classifiers: linear programming versus quadratic programming," Neural Computation, vol. 17, no. 5, pp. 1160-1187, 2005.
[23]C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, p. 27, 2011.
[24]C.-L. Huang and C.-J. Wang, "A GA-based feature selection and parameters optimizationfor support vector machines," Expert Systems with Applications, vol. 31, no. 2, pp. 231-240, 2006.
[25]E. B. Huerta, B. Duval, and J.-K. Hao, "A hybrid GA/SVM approach for gene selection and classification of microarray data," in Workshops on Applications of Evolutionary Computation, 2006, pp. 34-44: Springer, 2006.
[26]C. H. Lu, Y. C. Chen, C. S. Yu, and J. K. Hwang, "Predicting disulfide connectivity patterns," Proteins: Structure, Function, and Bioinformatics, vol. 67, no. 2, pp. 262-270, 2007.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊