跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.90) 您好!臺灣時間:2025/01/21 20:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳旻翰
研究生(外文):Wu Min-Han
論文名稱:以分群為基礎的純度插補與距離插補在醫療之應用
論文名稱(外文):Clustering-Based Purity Imputation and Distance Imputation in Medical Applications
指導教授:鄭景俗鄭景俗引用關係
指導教授(外文):Cheng, Ching-Hsue
口試委員:鄭景俗陳重臣張景榮
口試委員(外文):Cheng, Ching-HsueCHEN, JONG-CHENChang, Jing-Rong
口試日期:2019-05-15
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:73
中文關鍵詞:醫療病歷遺漏值插補分群純度插補距離插補
外文關鍵詞:health recordsdata imputationclusteringpurity imputationdistance imputation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:178
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
現今人們對於健康越來越重視,相對地,人們對於醫療病歷資料的完整性也越來越重視,近年來醫療插補領域進展的非常活躍,病例往往是給醫師一個依據診斷病人的確切症狀,但往往因為一些因素導致資料不完整,像是人為疏失、設備故障、網路斷線等種種原因,所以資料插補變得越來越重要,雖然已經越來越多種方法被提出,但很多做法都是以刪掉遺漏值作為處理,不論資料集樣本數多寡,皆可能刪除到重要欄位資料進而產生嚴重偏差。故本研究提出一個基於分群的純度插補與距離插補方法來提升正確率,本研究中收集8個醫療資料集並以不同遺漏型式及不同遺漏比率進行實驗。最後和常見插補方法進行比較並透過k-平均分群法、手肘法以及平均側影法找出最適當之分群群數。實驗結果表示,進行分群對提升插補準確度有很大的幫助,且本研究所提出之方法在各遺漏比率下都有較好的正確率。
Nowadays, people pay more and more attention on health. Relatively, people pay more and more attention to the integrity of medical records. In recent years, the field of medical imputation has been very active. Medical records often give doctors a basis for diagnosing patients. However, due to the variety reasons, such as human error, equipment failure, network disconnection and other reasons, data imputation has become more and more important. Although there are many imputation methods have been proposed, but there’s a lot of method handle the problem by deleting missing values, regardless of the size of the dataset, it is possible to delete important fields and cause the deviations. Therefore, this study proposes a clustering-based combined purity imputation method and distance imputation method to improve the accuracy. In this study collected 8 different datasets of medical, compare the accuracy between proposed imputation method and common imputation method in different situations and finding the optimal number of clusters by kMeans, Elbow Method and Average Silhouette Method. The experimental results show that clustering before imputation is very helpful to improve the accuracy of imputation, and the method proposed in this study has great accuracy in every missing degree.
摘要 i
Abstract ii
Table of contents iii
List of tables v
List of figures vi
1.Introduction 1
1.1 Background 1
1.2 Motivation and objectives 2
1.3 Organization of the thesis 2
2.Related work 4
2.1 Medical data imputation 4
2.2 Missing values type 5
2.3 Clustering 6
2.4 Imputation method 7
3.Proposed Method 12
3.1 Concept 12
3.2 Proposed imputation method 13
Step 1. Data Collection 14
Step 2. Data preprocessing 15
Step 3. Clustering 15
Step 4. Data Imputation 17
Step 5. Classification and Evaluation 18
4.Experiment and result 20
4.1Experimental environment 20
4.2Experiment datasets 22
4.3 Experiment and Result 25
4.3.1 Internal experiment 25
1.MAR and MCAR experiment 25
2.MNAR experiment 35
4.3.2 External experiment 36
4.3.3 RMSE 39
4.4 Finding 42
5.Conclusion 44
References 45
Appendices 48

Amiri, M., & Jensen, R. (2016). Missing data imputation using fuzzy-rough methods. Neurocomputing, 205, 152-164. doi:10.1016/j.neucom.2016.04.015
Andridge, R. R., & Little, R. J. A. (2010). A Review of Hot Deck Imputation for Survey Non-response. 78(1), 40-64. doi:doi:10.1111/j.1751-5823.2010.00103.x
b.Rubin, D. (1976). Inference and Missing Data. BIOMETRIKA.
Batista, G. E., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5-6), 519-533.
Bholowalia, P., & Kumar, A. (2014). EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN. International Journal of Computer Applications, 105.
Celestino Ordóñez Galán, Fernando Sánchez Lasheras, Francisco Javier de Cos Juez, & Sánchez, A. B. (2017). Missing data imputation of questionnaires by means of genetic algorithms with different fitness functions. Journal of Computational and Applied Mathematics, 311, 704–717.
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development. doi:10.5194/gmd-7-1247-2014
chang, C.-c., & lin, c.-j. (2011). LIBSVM -- A Library for Support Vector Machines.
Donders AR, van der Heijden GJ, & Stijnen T, M. K. (2006). Review: a gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10), 1087-1091.
Donders, A. R., van der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). Review: a gentle introduction to imputation of missing values. J Clin Epidemiol, 59(10), 1087-1091. doi:10.1016/j.jclinepi.2006.01.014
Enders, C. K. (2017). Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour Research and Therapy, 98, 4-18.
Garciarena, U., & Santana, R. (2017). An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Systems With Applications, 89, 52-65.
Huang, H.-H. (2018). A Nearest Neighbors Field Method Based on Distance for Missing Value Imputation in Medical Application. National Yunlin University of Science & Technology, Retrieved from https://hdl.handle.net/11296/3a3f4d
Jerez, J. M., Molina, I., Garcia-Laencina, P. J., Alba, E., Ribelles, N., Martin, M., & Franco, L. (2010). Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med, 50(2), 105-115. doi:10.1016/j.artmed.2010.05.002
Jerez, J. M., Molina, I., Subirats, J. L., & Franco, L. (2006). Missing Data Imputation in Breast Cancer Prognosis.
John, G. h., & langley, p. (1995). Estimating Continuous Distributions in Bayesian Classifiers. In proceedings of the eleventh conference on uncertainty in artificial intelligence, pp. 338-345.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Ruth Silverman, & Wu, A. Y. (2002). An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 7.
M€uhlenbruch, K., Kuxhaus, O., Giuseppe, R. d., Boeing, H., Weikert, C., & Schulze, M. B. (2017). Multiple imputation was a valid approach to estimate absolute risk from a prediction model based on caseecohort data. Journal of clinical epidemiology, 84, 130-141.
Macqueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations.
Mehran Amiri, & Jensen, R. (2016). Missing data imputation using fuzzy-rough methods. Neurocomputing, 205.
Mitra, S., & Pal, S. K. (1995). Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Transactions on Neural Networks, 6(1), 51-63. doi:10.1109/72.363450
monard, G. e. A. P. A. B. a. m. c. (2003). An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied artificial intelligence, 17(5-6), 519-533. pp. 449-461.
Nathaniel T. Ondeck, B. S., Michael C. Fu, M. D., M.H.S., , Laura A. Skrip, P. D., Ryan P. McLynn, B. S., Edwin P. Su, M. D., & Jonathan N. Grauer, M. D. (2017). Treatments of Missing Values in Large National Data Affects Conclusions: the Impact of Multiple Imputation on Arthroplasty Research. The Journal of Arthroplasty.
Pearl, J., & Russell, S. (1998). Bayesian Networks.
Pombo, N., Rebelo, P., Araújo, P., & Viana, J. (2015). Combining Data Imputation and Statistics to Design a Clinical Decision Support System for Post-Operative Pain Monitoring. Procedia Computer Science, 64, 1018-1025. doi:10.1016/j.procs.2015.08.621
Pombo, N., Rebelo, P., Araújo, P., & Viana, J. (2016). Design and evaluation of a decision support system for pain management based on data imputation and statistical models. Measurement, 93, 480-489. doi:10.1016/j.measurement.2016.07.009
Quinlan, J. R. (1993). C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc.
ROUSSEEUW, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. New york: John wiley & sons, inc.
Sammut, C., & Webb, G. I. (2011). Encyclopedia of Machine Learning: Springer Publishing Company, Incorporated.
Sandercock, P. A., Niewada, M., Członkowska, A., & Group, f. t. I. S. T. C. (2011). The International Stroke Trial database. Trials.
Shao, J. (2000). Cold deck and ratio imputation. Component of Statistics Canada.
SHEU, Y.-J. (2017). Enhanced k Nearest Neighbors Method for Imputation in Financial Distress Application. National Yunlin University of Science & Technology, Retrieved from https://hdl.handle.net/11296/2un2bg

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top