( 您好!臺灣時間:2023/02/06 16:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Yu-Sheng Yu
論文名稱(外文):Predicting Clinical Outcomes of Critically Ill Patients in Intensive Care Units with Combination of Machine Learning Models
指導教授(外文):Fei-Pei Lai
口試委員(外文):Yu-Chang YehLu-Cheng KuoShanq-Jang RuanDai-Lun Chiang
外文關鍵詞:intensive care unitpredictionmachine learningmortalitylength of stay
  • 被引用被引用:1
  • 點閱點閱:134
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0


方法:從NTUH CORE資料庫中提取了4,228名重症成人患者。在死亡率模型中,資料經過前處理後使用完整資料及平衡資料經由七種機器學習算法訓練。選擇了三種具有最高靈敏度,中等精確率和最高精確率的模型。在住院天數分類模型中,我們分析了住院天數在整個資料中的分佈,並將天數分為四種類別進行標記。然後,通過機器學習的多類預測,獲得這四種類別的結果和相應的預測概率。


Background: Although different types of predictive scoring system have been developed from the statistical analysis of data collected for a large number of patients, prediction of clinical outcome for patients in intensive care units still remains an important and difficult challenge. In recent years, with the development of machine learning technology, various algorithms have provided more powerful model inference capabilities and has been used to analyze such data.

Objective: This study aimed to use a three-step strategy to improve the sensitivity and precision in mortality prediction for critically ill patients by combining three machine learning models, and divide ICU length of stays into four classes for using the classification model to predict outcome instead of the regression model.

Method: A total of 4,228 adult intensive care patients were extract from NTUH CORE database. In mortality model, the data is trained through seven machine learning algorithms with whole data and balanced data after data preprocessing. Three models were selected with the abilities of the highest sensitivity, moderate precision, and the highest precision, respectively. In LOS classification model, we analyze the distribution of LOS in the whole data and divide days into four classes for labeling. Then, through the multi-class prediction of machine learning, the results of the four classes and the corresponding probabilities are obtained.

Result: In mortality model, 588 of the 843 patients in the testing dataset were classified into the low risk group with a mortality rate of 2.6% (95% CI, 1.4 to 4.2%) by using the highest sensitivity model, and other 255 patients went through the next step of prediction. After processing with moderate precision and the highest precision models, these 255 patients were further classified into the moderate risk group (210 patients), high-moderate risk group (26 patients), and adjusted high risk group (19 patients) with a mortality rate of 29% (95% CI, 23 to 35.7%), 73.1% (95% CI, 52.2 to 88.4%), and 94.7% (95% CI, 74 to 99.9%), respectively. The weighted average F1-score was 0.604 In LOS classification model, and the proportion of patients with LOS less than 7 days has better performance than those with LOS more than 7 days.

Conclusion: This study revealed that a three-step strategy process enhanced the predictability of 30-day mortality of critically ill patients by combination of the highest sensitivity, moderate precision, and the highest precision models.
口試委員會審定書 #
誌謝 i
中文摘要 ii
Chapter1. Introduction 1
1.1 Background 1
1.2 Related works 2
1.3 Objective 2
Chapter2. Architecture 4
2.1 Workflow 4
2.2 Three-step strategy prediction 5
2.3 Four-class prediction on LOS classification model 6
Chapter3. Methods 9
3.1 Data source 9
3.2 Patient Selection 10
3.3 Data distribution of physiological data 13
3.4 Data distribution of ICU length of stay 15
3.5 Feature Selection and Feature Engineering 16
3.6 Imbalanced Data 18
3.7 Missing Data 19
3.8 K-Fold Validation 20
3.9 Classification Model 21
3.9.1 Logistic Regression 21
3.9.2 K-nearest neighbors 22
3.9.3 Decision Tree 22
3.9.4 Random Forest 23
3.9.5 Linear Discriminant Analysis 24
3.9.6 AdaBoost 24
3.9.7 XGBoost 25
3.10 Model Assessment 26
Chapter4. Results 27
4.1 Characteristics of input data 27
4.2 Mortality model 27
4.2.1 Influence of sample ratio 27
4.2.2 Model selection 28
4.2.3 Three-step strategy prediction 30
4.2.4 Feature Importance 32
4.3 LOS classification model 34
Chapter5. Discussion 35
5.1 Mortality model 35
5.2 LOS classification model 36
Chapter6. Limitation 38
Chapter7. Conclusions and future work 39
1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25(1):44-56.
2. Benjamens S, Dhunnoo P, Mesko B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med 2020;3:118.
3. Calvert, J., Mao, Q., Hoffman, J. L., Jay, M., Desautels, T., Mohamadlou, H., ... & Das, R. (2016). Using electronic health record collected clinical variables to predict medical intensive care unit mortality. Annals of medicine and surgery, 11, 52-57.
4. Nanayakkara, S., Fogarty, S., Tremeer, M., Ross, K., Richards, B., Bergmeir, C., ... & Kaye, D. M. (2018). Characterising risk of in-hospital mortality following cardiac arrest using machine learning: A retrospective international registry study. PLoS medicine, 15(11), e1002709.
5. Pirracchio, R., Petersen, M. L., Carone, M., Rigon, M. R., Chevret, S., & van der Laan, M. J. (2015). Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. The Lancet Respiratory Medicine, 3(1), 42-52.
6. He HB, Garcia EA. Learning from Imbalanced Data. Ieee T Knowl Data En 2009;21(9):1263-1284.
7. Roumani YF, May JH, Strum DP, et al. Classifying highly imbalanced ICU data. Health Care Manag Sc 2013;16(2):119-128.
8. Sun YM, Wong AKC, Kamel MS. Classification of Imbalanced Data: A Review. Int J Pattern Recogn 2009;23(4):687-719.
9. Kim NJ, Bang JH, Choi JY, et al. The 2018 Clinical Guidelines for the Diagnosis and Treatment of HIV/AIDS in HIV-Infected Koreans. Infect Chemother 2019;51(1):77-88.
10. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 2011;12:2825-2830.
11. Brown LD, Cai TT, DasGupta A, et al. Interval estimation for a binomial proportion. Stat Sci 2001;16(2):101-133.
12. Kohn M, Senyak J. Sample Size Calculators Confidence interval for a proportion. UCSF CTSI Available at https://www.sample-size.net.: Accessed December 5, 2020.
13. Stow PJ, Hart GK, Higlett T, et al. Development and implementation of a high-quality clinical database: the Australian and New Zealand intensive care society adult patient database. J Crit Care 2006;21(2):133-141.
14. Knaus WA, Draper EA, Wagner DP, et al. Apache-Ii - a Severity of Disease Classification-System. Crit Care Med 1985;13(10):818-829.
15. Zimmerman JE, Kramer AA, McNair DS, et al. Acute physiology and chronic health evaluation (APACHE) IV: Hospital mortality assessment for today's critically ill patients. Crit Care Med 2006;34(5):1297-1310.
16. Gunn PP, Fremont AM, Bottrell M, et al. The Health Insurance Portability and Accountability Act Privacy Rule: a practical guide for researchers. Med Care 2004;42(4):321-327.
17. Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162(1):55-63.
18. Leisman DE, Harhay MO, Lederer DJ, et al. Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals. Crit Care Med 2020;48(5):623-633.
19. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002;16:321-357.
20. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE.
21. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001;17(6):520-525.
22. Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.
23. Sharma, H., & Kumar, S. (2016). A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR), 5(4), 2094-2097.
24. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
25. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 1997;55(1):119-139.
26. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top