跳到主要內容

臺灣博碩士論文加值系統

(44.220.251.236) 您好!臺灣時間:2024/10/11 05:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃任瑜
論文名稱:根據化學分子特徵與其生化路徑資訊以集成式學習方法研究化學分子的致敏性
論文名稱(外文):Study of allergenicity for chemical compounds by ensemble learning based on the molecular descriptors and the involved biochemical pathway
指導教授:游景盛
口試委員:張貴忠林佩君
口試日期:2023-06-27
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:中文
論文頁數:96
中文關鍵詞:小分子化合物過敏性機器學習集成學習生化路徑
外文關鍵詞:small molecule compoundsallergymachine learningensemble learningbiochemical pathways
相關次數:
  • 被引用被引用:0
  • 點閱點閱:75
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
過敏性疾病已經影響全球非常多的人口,尤其是在已開發、工業化的國家更為顯著,因此無論是醫療、食品和工業等領域的發展上有關過敏原辨識的研究日益受到重視,而過去幾年大多數方法的發展著重在蛋白質過敏原的預測,以及蛋白質疫苗是否會造成過敏性,相對於我們日常生活中更常接觸的飲食、藥妝成份中的化學物質,預測其過敏性的研究似正處於起步階段。本論文在此提出-集成式學習策略,將多種機器學習方法和深度學習演算法整合以改進化合物致敏性的辨識與分析,不同於源自生物遺傳產生的蛋白質分子的序列型態,形態各異的化合物可藉由 PaDEL 軟體的計算,將化合物轉換為 2D、3D、和 Fingerprints(FPs)等數量龐大的三類型特徵,透過機器學習方法進行特徵選取後,各分類器完成預測後,集成出最好的模型整合使用;經過評估比較前人的研究後顯示,集成學習模型甚至可在整體準確率上取得 9%的大幅進步表現,其準確性為 0.92、Area Under Curve(AUC)為 0.96、馬修斯相關係數(MCC)為 0.82。除此之外,比對資料集中的化合物於 Kyoto Encyclopedia of Genes and Genomes(KEGG) 資料庫所參與的生化路徑,經 word cloud 處理結果顯示,致敏化合物和非致敏化合物各有其顯著傾向的生化路徑類別,而機器學習平台模型所選取的特徵亦可特異化對應其生化路徑的關係,其發現除了可以更了解化合物的致敏特性的生物意義之外,在面對未知的化合物時也能透過比對分類好的生化路徑,判斷化學化合物的過敏性。最後,使用開發的模型、分析的生化路徑結果,對美國食品藥物管理局(FDA)認可的藥物進行過敏性預測,預測結果都與藥物相關論文顯示的過敏性一致,說明了開發的模型和生化路徑的實用性。

Allergic diseases have affected very large populations worldwide, especially in developed, industrialized countries. Therefore, the research on allergen identification in the fields of medical treatment, food and industry has been paid more and more attention. In the past few years, the development of most methods has focused on the prediction of protein allergens and whether protein vaccines will cause allergies. Compared with the chemical substances in diet and cosmeceutical ingredients that we are more often exposed to in our daily life, the research on predicting their allergies seems to be in its infancy. This paper proposes here an ensemble learning strategy that integrates multiple machine learning methods and deep learning algorithms to improve the identification and analysis of compound sensitization. Different from the sequence patterns of protein molecules derived from biological genetics, compounds with various shapes can be converted into a large number of three types of features such as 2D, 3D, and Fingerprints (FPs) through the calculation of PaDEL software. After feature selection through machine learning methods, after each classifier completes the prediction, the best model is integrated and used. After evaluation and comparison with previous studies, the ensemble learning model can even achieve a substantial improvement of 9% in the overall accuracy rate, with an accuracy of 0.92, Area Under Curve(AUC) of 0.96, and a Matthews correlation coefficient(MCC) of 0.82.In addition, compare the biochemical pathways that the compounds in the data set participate in in the Kyoto Encyclopedia of Genes and Genomes(KEGG) Database. The results of word cloud processing showed that sensitizing compounds and non-sensitizing compounds each had their own biochemical pathway categories with significant tendency, and the features selected by the machine learning platform model could also specifically identify the relationship corresponding to their biochemical pathways. It found that in addition to a better understanding of the biological significance of the allergenic properties of compounds, when faced with unknown compounds, it can also judge the allergies of chemical compounds through better biochemical pathways than the classification. Finally, using the developed model, analyzed biochemical pathway results, allergy predictions are performed for U.S. Food and Drug Administration (FDA)-approved drugs. The predictions were all consistent with the hypersensitivity shown in drug-related papers, illustrating the utility of the developed model and biochemical pathways.
誌謝 I
摘要 II
ABSTRACT III
目錄 IV
圖目錄 VII
表目錄 IX
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 2
1.4 研究架構 3
第二章 文獻回顧 4
2.1 蛋白質過敏性預測 4
2.2 化學化合物過敏性預測 6
第三章 實驗方法 8
3.1資料集以及使用的資料庫 8
3.1.1 Immune Epitope Database 8
3.1.2 Chemical Entities of Biological Interest 9
3.2 轉換化合物特徵 9
3.3 資料預處理 10
3.4 特徵選取 10
3.4.1 Random Forest 11
3.4.2 eXtreme Gradient Boosting 12
3.5 機器學習模型 13
3.6 集成學習 18
3.7 評估方法 20
3.8 基於生化路徑的實作方法 22
3.8.1KEGG DATABASE 22
3.8.2實作方法 22
第四章 實驗結果 25
4.1 機器學習模型性能評估 26
4.1.1整合最好的模型 30
4.2 2D、3D、FPS的重要特徵 32
4.3 使用開發模型預測的案例研究 43
第五章 討論 44
5.1 分析生化路徑 44
5.1.1 分析模型預測錯誤的化合物 45
5.1.2 分析生化路徑的類別 46
5.2分析研究案例生化路徑 49
5.3資料集中未找到生化路徑的化合物 50
第六章 結論 51
參考文獻 52
附錄 一 59
附錄 二 62


[1]Y. Smith, “Epidemiology of Allergies,” News-Medical, Dec. 30, 2022.
[2]K. K. Isaacs et al., “Characterization and prediction of chemical functions and weight fractions in consumer products,” Toxicol Rep, vol. 3, 2016, doi: 10.1016/j.toxrep.2016.08.011.
[3]R. N. E. Anggraini, S. Rochimah, and K. D. Dalmi, “Mobile nutrition recommendation system for 0-2 year infant,” in 2014 1st International Conference on Information Technology, Computer, and Electrical Engineering: Green Technology and Its Applications for a Better Future, ICITACEE 2014 - Proceedings, 2015. doi: 10.1109/ICITACEE.2014.7065755.
[4]K. Grifantini, “Knowing what you eat: Researchers are looking for ways to help people cope with food allergies,” IEEE Pulse, vol. 7, no. 5, 2016, doi: 10.1109/MPUL.2016.2592239.
[5]M. Kanehisa and S. Goto, “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research, vol. 28, no. 1. 2000. doi: 10.1093/nar/28.1.27.
[6]I. Dimitrov, D. R. Flower, and I. Doytchinova, “AllerTOP - a server for in silico prediction of allergens,” BMC Bioinformatics, vol. 14, no. SUPPL6, 2013, doi: 10.1186/1471-2105-14-S6-S4.
[7]N. Sharma, S. Patiyal, A. Dhall, A. Pande, C. Arora, and G. P. S. Raghava, “AlgPred 2.0: An improved method for predicting allergenic proteins and mapping of IgE epitopes,” Brief Bioinform, vol. 22, no. 4, 2021, doi: 10.1093/bib/bbaa294.
[8]R. Vita et al., “The Immune Epitope Database (IEDB): 2018 update,” Nucleic Acids Res, vol. 47, no. D1, 2019, doi: 10.1093/nar/gky1006.
[9]S. Maurer-Stroh et al., “AllerCatPro-prediction of protein allergenicity potential from the protein sequence,” Bioinformatics, vol. 35, no. 17, 2019, doi: 10.1093/bioinformatics/btz029.
[10]M. N. Nguyen, N. L. Krutz, V. Limviphuvadh, A. L. Lopata, G. F. Gerberick, and S. Maurer-Stroh, “AllerCatPro 2.0: a web server for predicting protein allergenicity potential,” Nucleic Acids Res, vol. 50, no. W1, 2022, doi: 10.1093/nar/gkac446.
[11]I. Dimitrov, I. Bangov, D. R. Flower, and I. Doytchinova, “AllerTOP v.2 - A server for in silico prediction of allergens,” J Mol Model, vol. 20, no. 6, 2014, doi: 10.1007/s00894-014-2278-5.
[12]H. C. Muh, J. C. Tong, and M. T. Tammi, “AllerHunter: A SVM-pairwise system for assessment of allergenicity and allergic cross-reactivity in proteins,” PLoS One, vol. 4, no. 6, 2009, doi: 10.1371/journal.pone.0005861.
[13]J. Wang, D. Zhang, and J. Li, “PREAL: Prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection,” BMC Syst Biol, vol. 7, no. SUPPL 5, 2013, doi: 10.1186/1752-0509-7-S5-S9.
[14]I. Dimitrov, L. Naneva, I. Doytchinova, and I. Bangov, “AllergenFP: Allergenicity prediction by descriptor fingerprints,” Bioinformatics, vol. 30, no. 6, 2014, doi: 10.1093/bioinformatics/btt619.
[15]Z. H. Zhang, J. L. Y. Koh, G. L. Zhang, K. H. Choo, M. T. Tammi, and J. C. Tong, “AllerTool: A web server for predicting allergenicity and allergic cross-reactivity in proteins,” Bioinformatics, vol. 23, no. 4, 2007, doi: 10.1093/bioinformatics/btl621.
[16]N. Sharma, S. Patiyal, A. Dhall, N. L. Devi, and G. P. S. Raghava, “ChAlPred: A web server for prediction of allergenicity of chemical compounds,” Comput Biol Med, vol. 136, 2021, doi: 10.1016/j.compbiomed.2021.104746.
[17]C. W. Yap, “PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints,” J Comput Chem, vol. 32, no. 7, 2011, doi: 10.1002/jcc.21707.
[18]J. Hastings et al., “ChEBI in 2016: Improved services and an expanding collection of metabolites,” Nucleic Acids Res, vol. 44, no. D1, 2016, doi: 10.1093/nar/gkv1031.
[19]H. Liu, C. Chen, Z. Guo, Y. Xia, X. Yu, and S. Li, “Overall grouting compactness detection of bridge prestressed bellows based on RF feature selection and the GA-SVM model,” Constr Build Mater, vol. 301, 2021, doi: 10.1016/j.conbuildmat.2021.124323.
[20]C. Chen et al., “DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network,” Comput Biol Med, vol. 136, 2021, doi: 10.1016/j.compbiomed.2021.104676.
[21]C. Chen et al., “Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier,” Comput Biol Med, vol. 123, 2020, doi: 10.1016/j.compbiomed.2020.103899.
[22]J. Tolles and W. J. Meurer, “Logistic regression: Relating patient characteristics to outcomes,” JAMA - Journal of the American Medical Association, vol. 316, no. 5. 2016. doi: 10.1001/jama.2016.7653.
[23]C. Zhang, X. Shao, and D. Li, “Knowledge-based support vector classification based on C-SVC,” in Procedia Computer Science, 2013. doi: 10.1016/j.procs.2013.05.137.
[24]Antonio Mucherino, Panos M. Pardalos, and Petraq J. Papajorgji, k-Nearest Neighbor Classification, vol. 34. Data Mining in Agriculture, 2009.
[25]J. R. Quinlan, “Induction of Decision Trees,” Mach Learn, vol. 1, no. 1, 1986, doi: 10.1023/A:1022643204877.
[26]H. Zhang, “Exploring conditions for the optimality of naïve bayes,” in International Journal of Pattern Recognition and Artificial Intelligence, 2005. doi: 10.1142/S0218001405003983.
[27]P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Mach Learn, vol. 63, no. 1, 2006, doi: 10.1007/s10994-006-6226-1.
[28]T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. doi: 10.1145/2939672.2939785.
[29]A. Nagisetty and G. P. Gupta, “Framework for detection of malicious activities in IoT networks using keras deep learning library,” in Proceedings of the 3rd International Conference on Computing Methodologies and Communication, ICCMC 2019, 2019. doi: 10.1109/ICCMC.2019.8819688.
[30]O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4. 2018. doi: 10.1002/widm.1249.
[31]D. Opitz and R. Maclin, “Popular Ensemble Methods: An Empirical Study,” Journal of Artificial Intelligence Research, vol. 11, 1999, doi: 10.1613/jair.614.
[32]D. Chicco, N. Tötsch, and G. Jurman, “The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min, vol. 14, 2021, doi: 10.1186/s13040-021-00244-z.
[33]P. D. Ghislain, A. D. Bodarwe, O. Vanderdonckt, D. Tennstedt, L. Marot, and J. M. Lachapelle, “Drug-induced Eosinophilia and Multisystemic Failure with Positive Patch-Test Reaction to Spironolactone: DRESS Syndrome,” Acta Derm Venereol, vol. 84, no. 1, 2004, doi: 10.1080/00015550310005915.
[34]X. Q. Zhu, L. Y. Li, W. M. Yang, and Y. Wang, “Combined dimercaptosuccinic acid and zinc treatment in neurological Wilson’s disease patients with penicillamine-induced allergy or early neurological deterioration,” Biosci Rep, vol. 40, no. 8, 2020, doi: 10.1042/BSR20200654.
[35]U.S. Food and Drug Administration, “FDA Drug Safety Communication: FDA warns of severe adverse events with application of Picato (ingenol mebutate) gel for skin condition; requires label changes,” Drug Safety Communications, 2015.
[36]Y. Xu, M. Wu, F. Sheng, and Q. Sun, “Methazolamide-induced toxic epidermal necrolysis in a Chinese woman with HLA-B5901,” Indian Journal of Ophthalmology, vol. 63, no. 7. 2015. doi: 10.4103/0301-4738.167105.
[37]S. H. Kim et al., “Prasugrel-induced hypersensitivity skin reaction,” Korean Circ J, vol. 44, no. 5, 2014, doi: 10.4070/kcj.2014.44.5.355.
[38]S. J. Martin and D. Shah, “Cutaneous Hypersensitivity Reaction to Digoxin,” JAMA: The Journal of the American Medical Association, vol. 271, no. 24. 1994. doi: 10.1001/jama.1994.03510480029018.
[39]M. H. Kim and J. M. Lee, “Diagnosis and management of immediate hypersensitivity reactions to cephalosporins,” Allergy Asthma Immunol Res, vol. 6, no. 6, 2014, doi: 10.4168/aair.2014.6.6.485.
[40]K. S. K. Ma, J. C. C. Wei, and W. H. Chung, “Correspondence to ‘Hypersensitivity reactions with allopurinol and febuxostat: A study using the Medicare claims data,’” Annals of the Rheumatic Diseases, vol. 81, no. 6. 2022. doi: 10.1136/annrheumdis-2020-218090.
[41]S. Ichimata, Y. Hata, and N. Nishida, “An autopsy case of sudden unexpected death with loxoprofen sodium–induced allergic eosinophilic coronary periarteritis,” Cardiovascular Pathology, vol. 44, 2020, doi: 10.1016/j.carpath.2019.107154.
[42]T. Lu and T. Grewal, “Ezetimibe: An Unusual Suspect in Angioedema,” Case Rep Med, vol. 2020, 2020, doi: 10.1155/2020/9309382.
[43]H. Heikkilä, S. Stubb, and S. Reitamo, “A study of 72 patients with contact allergy to tioconazole,” British Journal of Dermatology, vol. 134, no. 4, 1996, doi: 10.1111/j.1365-2133.1996.tb06969.x.
[44]T. S. Sonnex and R. J. G. Rycroft, “Allergic contact dermatitis from orthobenzyl parachlorophenol in a drinking glass cleaner,” Contact Dermatitis, vol. 14, no. 4, 1986, doi: 10.1111/j.1600-0536.1986.tb01235.x.
[45]P. del Villar-Guerra, B. M. Vicente-Arche, S. C. Bustamante, and C. S. Rodríguez, “Anaphylactic reaction due to cefuroxime axetil: A rare cause of anaphylaxis,” International Journal of Immunopathology and Pharmacology, vol. 29, no. 4. 2016. doi: 10.1177/0394632016664529.
[46]Y. H. Nam et al., “Immunologic evaluation of patients with cefotetan-induced anaphylaxis,” Allergy Asthma Immunol Res, vol. 7, no. 3, 2015, doi: 10.4168/aair.2015.7.3.301.
[47]D. J. Crotty et al., “Allergic Reactions in Hospitalized Patients with a Self-Reported Penicillin Allergy Who Receive a Cephalosporin or Meropenem,” J Pharm Pract, vol. 30, no. 1, 2017, doi: 10.1177/0897190015587254.

電子全文 電子全文(網際網路公開日期:20250701)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊