跳到主要內容

臺灣博碩士論文加值系統

(44.200.94.150) 您好!臺灣時間:2024/10/15 23:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李雅巽
研究生(外文):Ya-Hsun Lee
論文名稱:使用機器學習梯度提升法優化約登指數以找出最佳生物標記之組合
論文名稱(外文):Gradient Boosting the Youden Index - A new strategy to find the optimal biomarker combination
指導教授:程毅豪程毅豪引用關係郭炤裕
指導教授(外文):Yi-Hau ChenChao-Yu Guo
學位類別:碩士
校院名稱:國立陽明大學
系所名稱:公共衛生研究所
學門:醫藥衛生學門
學類:公共衛生學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:英文
論文頁數:41
中文關鍵詞:機器學習梯度提升法約登指數診斷正確率最佳生物標記之組合
外文關鍵詞:Machine LearningGradient BoostingYouden IndexDiagnosis AccuracyBiomarker Combination
相關次數:
  • 被引用被引用:0
  • 點閱點閱:241
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
目標:由於醫療技術的進展,我們能夠獲得越來越豐富的生物標記醫療數據,透過結合這些生物標記,我們可以對病人的疾病有更全盤的評估。梯度提升法是機器學習中集成學習的方法之一,不僅是各大機器學習競賽的常勝軍,而且有良好的模型解釋能力。約登指數可以對診斷之切點進行量化評估,進而利用此指數最佳化整體的準確率。因此本論文利用梯度提升法為演算法,結合約登指數為目標函數,提出新的統計方法。
方法:由S函數推導出新的平滑函數來近似約登指數中的符號函數,由此得到平滑之約登指數;再將目標函數對輸出函數做偏微分,得到目標函數的梯度。我們藉由目標函數的反向梯度對輸出函數進行優化,透過疊代不斷重複優化的過程,此步驟即為梯度提升法並且以約登指數為目標函數進行最佳化。
結果:經由多筆實際資料(包含小樣本及高維度資料)及模擬資料(包含線性及非線性)來評估模型的診斷正確率,新方法在線性及高維度資料中表現較佳。
結論:藉由優化約登指數的梯度提升法求得之生物標記組合在高維度資料中甚為實用,不僅有良好的模型解釋能力更在分類預測中具有優勢。
Objective: With the development of medical technology, we have more and more biomarker information recorded to evaluate the clinical situation of a patient. Combining the related biomarkers can help us to have a general picture of the disease. With the reasons above, we proposed a strategy to combine biomarkers. Gradient boosting is one of the ensemble learning methods in machine learning. Gradient boosting not only always being one of the winning solutions in machine learning competitions but also has a good interpretability of models. Youden Index is able to evaluate the cutoff point with quantity of the diagnostic test and we further apply it on the overall accuracy optimization. Hence, we proposed a new method with choosing gradient boosting as our algorithm and combining with the Youden Index as the target function as optimization.
Methods: A new smooth function is derived via modified the sigmoid function. Surrogating the sign function with the new smooth function in the Youden Index target function, a new smooth Youden Index was set as the target function. Gradient of the target function is the partial derivative of the Youden Index with respect to the output function. Optimize the output function with negative gradient and repeat the optimization process by iteration time.
Results: The performance in diagnostic accuracy was demonstrated in real data (including high dimension data and small sample data) and simulation data (including linear data and non-linear data). Our methods have a competitive outcome in linear data and high dimension data.
Conclusions: The biomarker combination derived from the new strategy, gradient boosting the Youden Index, may provide not only a good interpretability in a high dimension data structure but also a competitive perdition prediction accuracy.
Chapter 1 Introduction 1
1.1 Study Background 1
1.2 Study purpose 3
Chapter 2 Literature Review 4
2.1 Youden Index 4
2.1.1 An Introduction of Youden Index 4
2.1.2 The Objection Function of Youden Index 7
2.2 Gradient Boosting 9
2.2.1 Ensemble Methods 9
2.2.2 Bagging 10
2.2.3 The Beginning Concept of Boosting 11
2.2.4 The First Boosting Algorithm – Adaboost 11
2.2.5 From Machine Learning to Statistical Modeling – Gradient Boosting 13
2.2.6 Gradient Boosting Algorithm 13
2.2.7 Flow Chart of Gradient Boosting 15
Chapter 3 Proposed a New Method – Gradient Boosting the Youden Index 17
3.1 Applied the Youden Index as the target function in gradient boosting 17
3.2 The proposed new smooth function 18
3.3 Surrogate the sign function with new smooth function 19
3.4 The proposed new negative gradient 19
3.5 Parameter Setting 20
3.6 Programming approach 20
Chapter 4 Result 21
4.1 Simulation Result 21
4.2 Bupa Liver Disease Data 26
4.2.1 Data Description 26
4.2.2 Cross Validation and Model 27
4.2.3 Result 28
4.3 Leukemia Data 29
4.3.1 Data Description 29
4.3.2 Cross Validation and Model 30
4.3.3 Result 30
4.4 Tuberculosis(TB) Data 31
4.3.1 Data Description 31
4.3.2 Cross Validation and Model 31
4.3.3 Result 32
Chapter 5 Discussion 33
References 35
Attachment 38
Schwarz, H. P., & Dorner, F. (2003). Karl landsteiner and his major contributions to haematology. British Journal of Haematology, 121(4), 556-565. doi:10.1046/j.1365-
2141.2003.04295.x
Margie, P. (2000) The hepatitis B story. Beyond Discovery: The Path from Research to Human Benefit
Filler, A. G. (2009). The History, Development and Impact of Computed Imaging in Neurological Diagnosis and Neurosurgery: CT, MRI, and DTI. Nature Precedings.
doi:http://hdl.handle.net/10101/npre.2009.3267.1
Yin J, Vogel RL (2017) Using the ROC Curve to Measure Association and Evaluate Prediction Accuracy for a Binary Outcome. Biom Biostat Int J 5(3): 00134. DOI:
10.15406/bbij.2017.05.00134
Youden, W. J. (1950). "Index for rating diagnostic tests." Cancer 3(1): 32-35.
Xu, T., et al. (2015). "Flexible combination of multiple diagnostic biomarkers to improve diagnostic accuracy." BMC Med Res Methodol 15: 94.
Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. Paper presented at the Proceedings of the First International Workshop on Multiple Classifier Systems.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24:123-140.
Bauer E,& Kohavi R. (1999) An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Journal of Machine Learning. 36:105-139.
Hastie, T., Tibshirani, R.,& Friedman, J. (2009). The Elements of Statistical Learning.New York, NY, USA: Springer New York Inc.
Kearns, M. (1988). Thoughts on Hypothesis Boosting, Unpublished manuscript (Machine Learning class project, December 1988)
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. J. ACM, 41(1), 67-95.
Schapire, R. E. (1990). The Strength of Weak Learnability. Mach. Learn., 5(2), 197-227. doi:10.1023/a:1022648800760
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Inf. Comput.,121(2), 256-285. doi:10.1006/inco.1995.1136
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Paper presented at the Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari, Italy.
A. Mayr, & M. Schmid. (2014) The Evolution of Boosting Algorithms: From Machine Learning to Statistical Modelling
A. Mayr. (2017) An Update on Statistical Boosting in Biomedicine
J. H. Friedman. (2001) Greedy function approximation: a gradient boosting machine.
The Annals of Statistics, vol. 29, no. 5, pp.1189–1232.
B. Hofner, & M. Schmid. (2014) Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost.
Mayr, A., & Schmid, M. (2014). Boosting the Concordance Index for Survival Data – A Unified Framework To Derive and Evaluate Biomarker Combinations. PLOS ONE, 9(1), e84483. doi:10.1371/journal.pone.0084483
Golub, T et al (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring.
Park, Y., & Kang, Y. (2019). Identification of serum biomarkers for active pulmonary tuberculosis using a targeted metabolomics approach.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊