研究生(外文):Ya-Hsun Lee
論文名稱(外文):Gradient Boosting the Youden Index - A new strategy to find the optimal biomarker combination
指導教授(外文):Yi-Hau ChenChao-Yu Guo
外文關鍵詞:Machine LearningGradient BoostingYouden IndexDiagnosis AccuracyBiomarker Combination
Objective: With the development of medical technology, we have more and more biomarker information recorded to evaluate the clinical situation of a patient. Combining the related biomarkers can help us to have a general picture of the disease. With the reasons above, we proposed a strategy to combine biomarkers. Gradient boosting is one of the ensemble learning methods in machine learning. Gradient boosting not only always being one of the winning solutions in machine learning competitions but also has a good interpretability of models. Youden Index is able to evaluate the cutoff point with quantity of the diagnostic test and we further apply it on the overall accuracy optimization. Hence, we proposed a new method with choosing gradient boosting as our algorithm and combining with the Youden Index as the target function as optimization.
Methods: A new smooth function is derived via modified the sigmoid function. Surrogating the sign function with the new smooth function in the Youden Index target function, a new smooth Youden Index was set as the target function. Gradient of the target function is the partial derivative of the Youden Index with respect to the output function. Optimize the output function with negative gradient and repeat the optimization process by iteration time.
Results: The performance in diagnostic accuracy was demonstrated in real data (including high dimension data and small sample data) and simulation data (including linear data and non-linear data). Our methods have a competitive outcome in linear data and high dimension data.
Conclusions: The biomarker combination derived from the new strategy, gradient boosting the Youden Index, may provide not only a good interpretability in a high dimension data structure but also a competitive perdition prediction accuracy.
Chapter 1 Introduction 1
1.1 Study Background 1
1.2 Study purpose 3
Chapter 2 Literature Review 4
2.1 Youden Index 4
2.1.1 An Introduction of Youden Index 4
2.1.2 The Objection Function of Youden Index 7
2.2 Gradient Boosting 9
2.2.1 Ensemble Methods 9
2.2.2 Bagging 10
2.2.3 The Beginning Concept of Boosting 11
2.2.4 The First Boosting Algorithm – Adaboost 11
2.2.5 From Machine Learning to Statistical Modeling – Gradient Boosting 13
2.2.6 Gradient Boosting Algorithm 13
2.2.7 Flow Chart of Gradient Boosting 15
Chapter 3 Proposed a New Method – Gradient Boosting the Youden Index 17
3.1 Applied the Youden Index as the target function in gradient boosting 17
3.2 The proposed new smooth function 18
3.3 Surrogate the sign function with new smooth function 19
3.4 The proposed new negative gradient 19
3.5 Parameter Setting 20
3.6 Programming approach 20
Chapter 4 Result 21
4.1 Simulation Result 21
4.2 Bupa Liver Disease Data 26
4.2.1 Data Description 26
4.2.2 Cross Validation and Model 27
4.2.3 Result 28
4.3 Leukemia Data 29
4.3.1 Data Description 29
4.3.2 Cross Validation and Model 30
4.3.3 Result 30
4.4 Tuberculosis(TB) Data 31
4.3.1 Data Description 31
4.3.2 Cross Validation and Model 31
4.3.3 Result 32
Chapter 5 Discussion 33
References 35
Attachment 38
