跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.88) 您好!臺灣時間:2026/02/16 04:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃浩軒
研究生(外文):Huang, Hao-Hsuan
論文名稱:以距離為基礎的最鄰近區插補方法在醫療之應用
論文名稱(外文):A Nearest Neighbors Field Method Based on Distance for Missing Value Imputation in Medical Application
指導教授:鄭景俗鄭景俗引用關係
指導教授(外文):Cheng, Ching-Hsue
口試委員:鄭景俗陳重臣張景榮
口試委員(外文):Cheng, Ching-HsueChen, Jong-ChenChang, Jing-Rong
口試日期:2018-05-30
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:52
中文關鍵詞:最近鄰居演算法遺漏值插補醫療中風IST
外文關鍵詞:missing value imputationNearest NeighborHealth careStrokeIST
相關次數:
  • 被引用被引用:0
  • 點閱點閱:169
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
在醫療領域中,資料缺失是時常可見,而遺漏值的存在,將會影響到醫生學者們對該資料集的分析及預測結果。現今醫療研究中,大多數研究都專注在最佳正確率模型建立上,並沒有考慮模型在不同遺漏比率、遺漏型態下的穩定度。綜上所述,本研究提出一新穎的插補方法來處理資料不完整與穩定性問題。基於資料完整性與易用性,本研究提出以距離為基礎的最鄰近區插補方法,來提升遺漏值插補上的穩定性與資料完整性。在公開資料集實驗中,本論文選用了多個不同資料維度的UCI公開資料集,並製作出不同遺漏比率與遺漏型態資料集用於實驗,並將實驗正確率結果與常見的插補技術做比較。在實際資料集實驗,利用來自全國中風調查(IST)的中風資料集驗證所提出的方法是否能有效運用在實務方面。從結果來看,本研究提出的方法能夠在不同的遺漏比率、遺漏型態、資料集的模擬實驗下,取得良好成績。此外也能在中風資料集獲得高達百分之九十的正確率。這代表此方法能夠有效應用於實務資料集當中。
In medical filed, missing data is often existed, which will affect the analysis and prediction by doctors and scholars. Most studies have focused on the highest accuracy prediction model in current medical research. However, they do not have considered the stability of models with different missing degrees and different missing types. Based on data complete and easy operation, this study proposes the imputation method, which is nearest neighbors method based on distance to imputation missing value. In the experiment, this study selected several UCI datasets and produced different missing degrees and missing types. Comparing training accuracy with popular imputation methods. Moreover, using the Stroke dataset from the International Stroke Trial (IST) to verify whether the proposed method could be effectively used in practice. The results show that, the proposed method can have good performances under different simulations of missing degrees, missing types, and datasets. In addition, the proposed method can obtained 90-percentage accuracy in the Stoke dataset. It means the proposed method can be effectively used in the practice dataset.
摘要....................................i
Abstract................................ii
Content.................................iii
List of Tables..........................iv
List of Figures.........................v
1. Introduction....................1
1.1 Background......................1
1.2 Motivation and objectives.......2
1.3 Organization of the thesis......2
2. Related Work....................4
2.1 Missing value in Healthcare.....4
2.2 Nearest Neighbor Technique......5
2.3 Type of missing values..........7
2.4 Imputation methods..............8
3. Proposed Method.................12
3.1 Concept.........................12
3.2 Proposed algorithm..............14
Step 1. Data Collection.................15
Step 2. Data preprocessing..............15
Step 3. Data imputation.................17
Step 4. Classification and Evaluation...18
4. Experiment and Results..........20
4.1 Experimental environment........20
4.2 UCI Datasets Experimental.......20
4.3 Stroke Datasets Experimental....36
4.4 Finding.........................39
5. Conclusion......................42
References..............................43

Amiri, M., & Jensen, R. (2016). Missing data imputation using fuzzy-rough methods. Neurocomputing, 205(12), 152-164
Batista, G. E. A. P. A., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5-6), 519-533.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
Donders, R., Heijden, G. J. M. G. v. d., Stijnen, T., & Moons, K. G. M. (2006). Review: a gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10), 1087-1091.
Enders, C. K. (2017). Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour Research and Therapy, 98, 4-18.
Galán, C. O., Lasheras, F. S., Juez, F. J. d. C., & Sánchez, A. B. (2017). Missing data imputation of questionnaires by means of genetic algorithms with different fitness functions. Journal of Computational and Applied Mathematics, 311, 704–717.
Garciarena, U., & Santana, R. (2017). An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Systems With Applications, 89, 52-65.
Lee, Y.-S., Hs, C.-C., Weng, S.-F., Lin, H.-J., Wang, J.-J., Su, S.-B., . . . How-Ran Guo. (2015). Cancer Incidence in Physicians: A Taiwan National Population-based Cohort Study. Medicine, 94.
Mühlenbruch, K., Kuxhaus, O., Giuseppe, R. d., Boeing, H., Weikert, C., & Schulze, M. B. (2017). Multiple imputation was a valid approach to estimate absolute risk from a prediction model based on caseecohort data. Journal of clinical epidemiology, 84, 130-141.
Mitra, S., & Pal, S. K. (1995). Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Transactions on Neural Networks, 6(1), 51-63.
Ondeck, N. T., Fu, M. C., Skrip, L. A., McLynn, R. P., Su, E. P., & Grauer, J. N. (2017). Treatments of Missing Values in Large National Data Affects Conclusions: the Impact of Multiple Imputation on Arthroplasty Research. The Journal of Arthroplasty, 33(3), 661-667
Pedro J. García-Laencina , Pedro Henriques Abreu, Miguel Henriques Abreu, & Afonoso, N. (2015). Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Computers in Biology and Medicine, 59, 125-133.
Pombo, N., Rebelo, P., Araújo, P., & Viana, J. (2016). Design and evaluation of a decision support system for pain management based on data imputation and statistical models. Measurement, 83, 480-489.
Quinlan, J. R. (1993). C4. 5: Programming for machine learning.San Francisco, CA, Morgan Kauffmann.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592.
Sarkar, M. (2007). Fuzzy-rough nearest neighbor algorithms in classification. Fuzzy Sets and Systems, 158, 2134-2152. doi:10.1016/j.fss.2007.04.023
Sheu, Y.-J. (2017). Enhanced k Nearest Neighbors Method for Imputation in Financial Distress Application. (unpublished Master's thesis ), National Yunlin University of Science & Technology,
Siegel, R., Ma, J., Zou, Z., & Jemal, A. (2014). Cancer statistics, 2014. CA:A Cancer Journal for Clinicians, 64, 9-29.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top