跳到主要內容

臺灣博碩士論文加值系統

(3.235.120.150) 您好!臺灣時間:2021/08/03 07:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳俗玄
研究生(外文):Chen, Su-Hsuan
論文名稱:運用基因演算法發展差異性極大化之集成式分類器
論文名稱(外文):Using Genetic Algorithm to Optimize the Diversity of Classifier Ensemble
指導教授:蘇朝墩蘇朝墩引用關係薛友仁薛友仁引用關係
指導教授(外文):Su, Chao-TonShiue, Yeou-Ren
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:45
中文關鍵詞:集成式分類器基因演算法決策樹資料探勘
外文關鍵詞:Classifier EnsembleGenetic algorithmDecision TreeData Mining
相關次數:
  • 被引用被引用:4
  • 點閱點閱:115
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
分類方法為資料探勘的主要內容之一,在過去文獻中,常被提出來使用的分類器如決策樹、類神經網路…等,皆屬獨立型分類器。近年來,多位學者指出由多個獨立分類器結合而成的集成式分類器被認為能比獨立分類器有更好的分類效果,而集成式分類器主要的分類方式是整合各獨立分類器的輸出結果並得到最後的決策,因此分類器間是否具有差異就成為影響分類效果的重要因素。
由於差異性被認為對分類正確率有所影響,且在過去研究中少有極大化差異性的研究,因此本研究採用操控訓練樣本產生差異性的方式提出一個運用基因演算法極大化分類器間差異性的集成式演算法(DECRTS)來提高分類器的差異性,同時與文獻中具代表性的集成式分類方法以21個UCI案例資料集實驗並分析比較。實驗結果顯示,本研究提出的DECRTS在研究使用的6種演算法中具有較佳的分類正確率平均值(82.19%),並具統計上的顯著性,亦表示DECRTS能使多數的研究資料集在分類正確率獲得改善。此外,由實驗結果亦可發現若使用不同形式產生差異性的方法,在特定資料集會有較優異的分類結果。

Data classification method is one of the main tasks of data mining. In the literature, there are many classic base inducers used to train the classifier such as decision tree, neural network…etc., which are all individual classifier. In the past few years, many researches have proposed that the classifier ensemble, which composed by more than one individual classifier, is more effective than any individual classifier of the classifier ensemble. The main idea for classifier ensemble to classify a new sample is to combine the output of each individual classifier and then reach the final decision. Therefore, the diversity between the classifiers is considered as an important factor in classification accuracy.
Because there are few literatures to research about how to optimize the diversity, this paper would propose an ensemble method(Diversity by evolutionary computing resampling training subset, DECRTS)that uses the genetic algorithm to encourage the diversity between classifiers by manipulating the train data set. We design an experiment using 21 UCI Repository of machine learning databases to test and verify and then comparing with individual classifier and other classifier ensembles. The result provides that the DECRTS in our experiment has better average accuracy(82.19%)and is significantly difference with other method except Adaboost(81.99%). Moreover, the experiment appears the different method to create diversity sometimes would have better performance in particular datasets.

摘要 I
Abstract II
目錄 III
圖目錄 IV
表目錄 V
第一章 緒論 1
1.1 背景與動機 1
1.2 研究目的 1
1.3 研究架構 2
第二章 相關研究 4
2.1集成式學習法 4
2.1.1 Bagging 4
2.1.2 Adaboost 5
2.2差異性 7
2.2.1 差異性衡量 7
2.2.2 BUD 9
2.2.3 DECORATE 10
2.3 決策樹 11
2.4 基因演算法 13
2.4.1 基因演算法基本原理 13
2.4.2 基因演算法步驟 13
2.4.3 基因演算法的優點 19
第三章 研究方法 21
3.1 DECRTS 21
3.2實驗設計 25
3.2.1 資料來源 25
3.2.2 演算法及工具 26
3.2.3 評估與比較方法 27
第四章 實驗結果 31
4.1分類正確率 31
4.2 AUC比較 35
4.3討論 39
第五章 結論與未來研究 43
5.1結論 43
5.2未來研究 44
參考文獻 45

[1] Rokach, L., 2010, “Ensemble-based classifiers,” Artificial Intelligence Review, Vol. 33, No.1-2, pp.1-39.
[2] Breiman, L., 1996, “Bagging predictors,” Machine Learning, Vol. 24, No.2, pp.123-140.
[3] Freund, Y. and Schapire, R. E., 1996, “Experiments with a new boosting algorithm,” In Proceedings of the 13th International Conference on Machine Learning, pp.148-146, San Francisco, CA: Morgan Kaufmann.
[4] Tumer, K. and Ghosh, J., 1996, “Error Correlation and Error Reduction in Ensemble Classifiers,” Connection Science, Special issue on combining artificial neural networks: ensemble approaches, Vol. 8, No.3-4, pp.385-404.
[5] Krogh, A., and Vedelsby, J., 1995, “Neural network ensembles, cross validation and active learning,” In Advances in Neural Information Processing Systems 7, pp.231-238.
[6] Kuncheva, L.I., 2005, “Using diversity measures for generating error-correcting output codes in classifier ensembles,” Pattern Recognition Letters, Vol. 26, pp.83-90.
[7] Kuncheva, L. and Whitaker, C., 2003, “Measures of diversity in classifier ensembles and their relationship with ensemble accuracy,” Machine Learning, pp.181-207.
[8] Kuncheva, L.I., 2005, “Diversity in multiple classifier systems,” Information Fusion, Vol. 6, No.1, pp.3-4.
[9] Hu, X., 2001, “Using Rough Sets Theory and Database Operations to Construct a Good Ensemble of Classifiers for Data Mining Applications,” ICDM2001, Proceedings IEEE International Conference, pp.233-240.
[10] Zenobi, G. and Cunningham, P., 2001, “Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error,” In Proceedings of the European Conference on Machine Learning, pp.576-587.
[11] Brown, G., Wyatt, J., Harris, R., Yao, X., 2005, “Diversity creation methods: a survey and categorisation,” Information Fusion, Vol. 6, No.1, pp.5-20.
[12] Skalak, D., 1996, “The sources of increased accuracy for two proposed boosting algorithms,” In Proc. American. Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, pp. 120-125.
[13] Tang, E.K., Suganthan, P. N., Yao X., 2006, “An analysis of diversity measures,” Machine Learning, Vol. 65, No. 1, pp.247-271.
[14] Melville, P., and Mooney, R. J., 2005, “Creating diversity in ensembles using artificial data,” Information fusion, Vol. 6, No.1, pp.99-111.
[15] Breiman, L., Friedman, J., Olshen, R., and Stone, C., 1984, “Classification and Regression Trees,” Reading, MA: Wadsworth.
[16] Blake, C.L., Merz, C.J., 1998, “UCI Repository of machine learning databases,” Irvine, CA: University of California, Department of Information and Computer Science, http://www.ics.uci.edu/~mlearn/MLRepository.html.
[17] Weka 3: Data Mining Software in Java, http://www.cs.waikato.ac.nz/ml/weka/index.html.
[18] Kohavi, R., 1995, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” C.S. Mellish, Proceedings IJCAI-95, pp.1137-1143, Montreal, Que., Morgan Kaufmann, Los Altos, CA.
[19] Demsar, J., 2006, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research, Vol. 7, pp.1-30.
[20] Provost, F., and Fawcett, T., 2001, “Robust classification for imprecise environments,” Machine Learning, Vol. 42, No.3, pp.203-231.
[21] Quinlan, J.R., 1993, “C4.5: Programs for Machine Learning,” Reading, Morgan Kaufmann, Los Altos.
[22] Michalewicz, Z., 1992, “Genetic algorithms + data structures = evolution programs,” reading, New York: Springer.
[23] Quinlan, J.R., 1996, “Bagging, Boosting, and C4.5,” In: Proceedings of the thirteenth national conference on artificial intelligence, pp.725-730.
[24] Maclin, R., and Opitz, D., 1997, “An empirical evaluation of bagging and boosting,” In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pp.546-551, Cambridge, MA: MIT Press.
[25] Bauer, E., and Kohavi, R., 1999, “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants,” Machine Learning, Vol. 36, No.1-2, pp.105-139.
[26] Goldberg, D. E., 1989, “Genetic Algorithms in search, Optimization and Machine learning,” Reading, MA: Addison-Wesley.
[27] Shiue, Y.R., Guh, R.S., and Tseng, T.Y., 2009, “GA-based learning bias selection mechanism for real time scheduling systems,” Expert Systems with Applications, Vol. 36, No. 9, pp.11451-11460.
[28] Breiman, L., 2000, “Randomizing outputs to increase prediction accuracy,” Machine Learning, Vol. 40, No.3, pp.229-242.
[29] Rokach, L., 2010, “Pattern Classification Using Ensemble Methods,” Reading, Singapore: World Scientific Publishing.
[30] Dudoit, S., Fridlyand, J., 2001, “Bagging to Improve the Accuracy of a Clustering Procedure,” Technical Report 600, Department of Statistics, University of California, Berkeley, Oxford Journals of Life Sciences and Mathematics and Physical Sciences, Vol.19, No.9, pp.1090-1099.
[31] Dos Santos, E.M., Sabourin, R., Maupin, P., 2006, “Single and multi-objective genetic algorithms for the selection of ensemble of classifiers,” Proceedings of International Joint Conference on Neural Networks, pp.5377-5384.
[32] Blanco, A., Delgado, M., Pegalajr, M.C., 2001,”A real-coded genetic algorithm for training recurrent neural networks,” Neural Networks, Vol.14, pp.93-105.
[33] 蘇朝墩,2002,品質工程,中華民國品質學會。
[34] 施雅月、賴錦慧譯, Pang-Ning Tan, Michael Steinbach and Vipin Kuma, 2008, 資料探勘,Pearson,ISBN:978-986-154-657-5。

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊