( 您好!臺灣時間:2021/05/13 07:43
字體大小: 字級放大   字級縮小   預設字形  


研究生:Muhammad Nashrullah
研究生(外文):Muhammad Nashrullah
論文名稱:Building Knowledge-based Decision Tree by SVM and Entropy
指導教授:Yo-Ping Huang
中文關鍵詞:machine learningdecision treeInformation gain
  • 被引用被引用:0
  • 點閱點閱:103
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:35
  • 收藏至我的研究室書目清單書目收藏:0
In recent years machine learning has become a popular research topic. There are many applications implemented by the combinations of different techniques such as Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and deep learning in obtrusive platforms that include the domains of healthcare, economy and agriculture. Healthcare researchers have built application systems to help clinicians diagnose diseases. However, some models lack flexibility to interpret the knowledge as if clinician’s indulgence. To overcome such problems, we use support vector machines, one of the supervised learning algorithms with kernel radial basis function (RBF) as a nonlinear classification model, to separate the hyperplane and maximize the margin that can tolerate small errors. From those correctly classified patterns from SVM model, we proceed to find a better split point for the next step. Split point is used to calculate information gain that can be applied to select principal features among all attributes. Finally, we construct the knowledge-based decision tree from the ordered information gain to classify the unknown medical patterns. Simulation results from different data sets verify that the proposed model is effective and feasible for the classification of medical databases.
List of Figures vi
List of Tables vii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Previous Related Work 1
1.3 Problem Statement 3
1.4 Thesis Development 3
Chapter 2 Literature Review 5
2.1 Introduction 5
2.2 Support Vector Machine 5
2.3 Cross validation 7
2.4 Entropy 8
2.5 Decision Tree 9
2.6 Performance measurements 9
2.7 Missing Values 11
2.7.1 Types of Missing Values 11
2.7.2 Handling Missing Data 11
Chapter 3 Methods 13
3.1 Introduction 13
3.2 Data Set 14
3.2.1 Mammographic Mass Data Set 14
3.2.2 Vertebral Column Data Set 14
3.2.3 Diabetic Retinopathy Debrecen Data Set 15
3.3 Support Vector Machines 15
3.4 Observation of split points. 19
3.4.1 Observation in Mammographic Mass Data Set 21
3.4.2 Observation in Vertebral Column Data Set 23
3.4.3 Observation in Diabetic Retinopathy Debrecen Data Set 26
3.5 Calculation of Information Gain. 28
Chapter 4 Results and Discussion 33
4.1 Result of Mammographic Mass Data Set 33
4.2 Result of Vertebral Column Data Set 35
4.3 Result of Diabetic Retinopathy Debrecen Data Set 37
Chapter 5 Conclusions and Future Works 40
References 41
About the Author 44
[1]Eortc.org, The European Organization for Research and Treatment of Cancer,
[2]P. Padilla, M. Lopez, J. M. Gorriz, J. Ramirez, D. Salas-Gonzalez, and I. Alvarez, “NMF-SVM based CAD tool applied to functional brain images for the diagnosis of Alzheimer’s disease,” IEEE Transactions on Medical Imaging, vol. 31, no. 2, pp. 207-216, Feb. 2012.
[3]S. Rathore, M. Hussain, and A. Khan, “GECC: Gene Expression Based Ensemble Classification of Colon Samples,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no. 6, pp.1131-1145, Nov./Dec. 2014.
[4]L. J. van ’t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend, “Gene expression profiling predicts clinical outcome of breast cancer,” Journal of Nature, vol. 415, no. 6871, pp.530-536, Jan. 2002.
[5]Aacr.org, American Association for cancer research,
[6]N. Barakat and J. Diederich, “Eclectic Rule-Extraction from Support Vector Machines,” International Journal of Computational Intelligence, vol. 2, no. 1, pp.59-62, May 2005.
[7]L. Han, S. Luo, J. Yu, L. Pan, and S. Chen, “Rule extraction from support vector machines using ensemble learning approach: An application for diagnosis of diabetes,” IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 2, pp.728-734, March 2015.
[8]H. Yazdi and N. Salehi-Moghaddami, “Multi Branch Decision Tree: A New Splitting Criterion,” International Journal of Advanced Science and Technology, vol. 45, no. 2, pp.91-106, Aug. 2012.
[9]L. Breiman, “Random Forests,” Machine Learning, vol. 45.1, pp.5-32, 2001.
[10]P. Domingos, “A few useful things to know about machine learning,” Communications of the ACM, vol. 55, no. 10, pp.78-87, Oct. 2012.
[11]C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp.273-297, March 1995.
[12]Cancer.org, National Cancer Institute,
[13]D. Chakraborty, U. Maulik, and S. Member, “Identifying cancer biomarkers from microarray data using feature selection and semisupervised learning,” IEEE Journal of Translational Engineering in Health and Medicine, vol. 2, Nov. 2014.
[14]B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” Proc. of the Fifth Annual Workshop on Computational learning theory, pp.144-152, Jul. 1992.
[15]R. Kohavi, “A study of cross validation and bootstrap for accuracy estimation and model selection,” International Joint Conference on Artificial Intelligence, vol. 2, pp.1137-1143, Aug. 1995.
[16]G. James, D. Witten, T. Hastie, and R. Tibshirani, Springer Texts in Statistics An Introduction to Statistical Learning - with Applications in R. 2013.
[17]T. M. Cover and J. A. Thomas, Elements of Information Theory. Second Edition, NY, USA: Wiley-Interscience, 2005.
[18]D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3, pp.581-592, Dec. 1976.
[19]J. Han, J. Pei, and M. Kamber, Data Mining: Concepts and Techniques, vol. 3. 2012.
[20]M. Elter, R. Schulz-Wendtland and T. Wittenberg, “The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process,” Journal of Medical Physics, vol. 34, no. 11, pp.4164-4172, Nov. 2007.
[21]M. Lichman, UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013.
[22]A. B´alint and A. Hajdu, “An ensemble-based system for automatic screening of diabetic retinopathy,” Journal of Knowledge-Based Systems, vol. 60, pp.20-27, Jan. 2014.
[23]D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel and F. Leisch, “e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien”, 2015.
[24]R Core Team, “R: A language and environment for statistical computing,” 2015.
[25]L. Rokach and O. Maimon, Classification Trees, Data Mining and Knowledge Discovery Handbook, pp.149-174, 2010.
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔