跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.82) 您好!臺灣時間:2024/12/14 09:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:劉力瑋
研究生(外文):Li Wei
論文名稱:不對稱分佈混合增加少數法及多專家分類法之研究-以攝護腺肥大及攝護腺癌偵測為例
論文名稱(外文):Using Over-sampling and Multi-classifier Committee Approach for skewed class distribution – a case study of diagnosis model construction of Benign prostate hypertrophy and Cancer of prostate
指導教授:吳帆吳帆引用關係
指導教授(外文):Fan Wu
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊管理所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:95
語文別:英文
論文頁數:40
中文關鍵詞:不對稱分佈增加少數法多專家分類器
外文關鍵詞:skewed distributionOver-samplingMulti-classifier Committee Approach
相關次數:
  • 被引用被引用:1
  • 點閱點閱:854
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
對於類別數量平均分佈的資料集合,利用目前已知的資料探勘分類法所建立出的預測模型,已經能達到一定程度的預測準確率,然而在真實的資料探勘的研究當中,資料集合往往會有不對稱分佈(Skewed Distribution)的情況。在臨床上,健康的人的數量往往遠大於不健康者的數量,因此在資料收集上,會有先天上的不對稱分佈。利用這些不對稱分佈的資料來建立預測模型,往往會有嚴重預測偏向的問題。
目前已知解決不對稱分佈的方法有減少多數法、增加少數法及多專家分類器,本研究混合增加少數法及多專家分類器二種方法並改良方法的細節,目的在提高少數資料的預測正確率。本研究以攝護腺肥大及攝護腺癌的資料為例,用以測試本研究提出之方法之分類效能。
Regarding the non-skewed distribution, to utilize the existing data mining classification to construct the prediction model can reach a certain level of prediction accuracy. However, in the real data mining case, the dataset distribution is always skewed distribution. In clinical case, because the number of healthy people is more than the number of unhealthy people, the collected data would be congenital skewed distribution. If we utilize those dataset with skewed distribution to construct the prediction model, the prediction deviation should be a big problem.
There are three existing solutions for skewed distribution – Under-sampling, Over-sampling, and Multi-classifier Committee Approach. This research will utilize Over-sampling and Multi-classifier Committee Approach for skewed distribution and improve them. The research objective is to raise the prediction accuracy of the minor part of the dataset. The case study is the disease of benign prostate hypertrophy and cancer of prostate. And this research will use those data to test the classification efficiency of my algorithm.
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Organization of thesis 4
Chapter 2 Literature Review 5
2.1 Decision tree and Classification and Regression Tree 5
2.2 k-means Method 8
2.3 Skewed class distribution 8
2.4 Benign Prostate Hypertrophy and Cancer of Prostate 10
Chapter 3 Algorithm 12
3.1 Problem definition and prediction strategy 12
3.2 Modeling phase - Improved SMOTE 14
3.3 Predict phase –Multi-classifier co-work 19
Chapter 4 Simulation 21
4.1 Evaluation criteria 21
4.2 Experiment design and simulation 22
4.3 The case of prostate 32
Chapter 5 Conclusion 35
5.1 Conclusion and Achievement 36
5.2 Research Restriction 37
5.3 Future work 37
Reference: 39
[1].P. S. Bradley, U. M. Fayyad, and O. L. Mangasarian. Data mining: Overview and optimization opportunities. INFORMS Journal on Computing, 11:217{238, 1999.ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-01.ps.
[2].M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall, New Jersey, 2003.
[3].Chyi Yu Meei: Classification Analysis Techniques for Skewed Class distribution Problems, 2003.
[4].Roger J. Lewis, M.D., Ph.D. Department of Emergency Medicine Harbor-UCLA Medical Center Torrance, California: An Introduction to Classification and Regression Tree (CART) Analysis
[5].Kate McCarthy, Bibi Zabar and Gary Weiss: Does Cost-Sensitive Learning Beat Sampling for Classifying Rare Classes? Fordham University
[6].L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classication and Regression Trees. Wadsworth, Belmont, Ca., 1984
[7].Kubat, M. and Matwin, S., “Addressing the curse of imbalanced training sets: one-sided selection,” Proceedings of the 14th International Conference on Machine Learning, 1997, pp. 179-186.
[8].Lewis, D. and Catlett, J., “Heterogeneous Uncertainty Sampling for Supervised Learning,” Proceedings of the 11th International Conference on Machine Learning, 1994, pp.144-156.
[9].Wei, C., Piramuthu, S. and Shaw, M. J., “Knowledge Discovery and Data Mining,” Chapter 41 in Handbook of Knowledge Management, Vol. 2, C. W. Holsapple (Ed.), Springer-Verlag, Berlin, Germany, 2003, pp.157-189.
[10].Ling Shih-Shiung: Empirical Evaluations of Different Strategies for Classification with Skewed Class Distribution 2004
[11].Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, P., “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, Vol. 16, 2002, pp.321-357.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top