跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.188) 您好!臺灣時間:2025/10/07 19:37
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳家齊
研究生(外文):Chia-Chi Wu
論文名稱:資源有限下的決策樹建構
論文名稱(外文):Resource-Constrained Decision Tree Induction
指導教授:陳彥良陳彥良引用關係
指導教授(外文):Yen-Liang Chen
學位類別:博士
校院名稱:國立中央大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:英文
論文頁數:97
中文關鍵詞:決策樹資料探勘分類成本感知學習
外文關鍵詞:data miningcost-sensitive learningdecision treeclassification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:382
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
分類是資料探勘中一個非常重要的研究領域。在現存的許多分類器當中,決策樹可能是最受歡迎、也最常被使用的分類模型。現有的大多數決策樹演算法皆致力於將分類精確度最大化、將分類錯誤率最小化。然而,在許多現實生活應用中,從以現有資料建立決策樹,到用決策樹分類未來資料的每個過程,都可能包含了各式各樣不同種類的成本或資源消耗。依據我們所面對的問題,我們也有可能需要在有限的資源底下完成分類工作。因此,如何在資源有限下建立出最適用的決策樹是一個很重要的議題。在本研究中,我們首先提出了兩個改良自傳統TDIDT﹝Top-Down Induction on Decision Trees, 由上而下的決策樹建構﹞的演算法。接著,我們採用了一個全新的方法來處理多種資源限制的問題。我們所提出的新方法先從訓練資料集中粹取出所有合法的分類規則,再利用這些粹取出的規則建出一棵決策樹。我們使用實際資料來進行完整的實驗評估。實驗結果顯示,我們提出的方法在不同資源限制下的表現都是令人滿意的。
Classification is one of the most important research domains in data mining. Among the existing classifiers, decision trees are probably the most popular and commonly-used classification models. Most of the decision tree algorithms aimed to maximize the classification accuracy and minimize the classification error. However, in many real-world applications, there are various types of cost or resource consumption involved in both the induction of decision tree and the classification of future instance. Furthermore, the problem we face may require us to complete a classification task with limited resource. Therefore, how to build an optimum decision tree with resource constraint becomes an important issue. In this study, we first propose two algorithms which are improved versions of traditional TDIDT(Top-Down Induction on Decision Trees) algorithms. Then, we adopt a brand new approach to deal with multiple resource constraints. This approach extracts association classification rules from training dataset first, and then builds a decision tree from the extracted rules. Empirical evaluations were carried out using real datasets, and the results indicated that the proposed methods can achieve satisfactory results in handling data under different resource constraints.
中文摘要 I
ABSTACT II
誌謝 III
1. INTRODUCTION 1
1.1. MOTIVATIONS AND RESEARCH OBJECTIVES 2
1.2. ORGANIZATION OF THE DISSERTATION 4
2. LITERATURE REVIEW 5
2.1. MISCLASSIFICATION-COST SENSITIVE DECISION TREES 5
2.1.1 Instance Weighting 6
2.1.2 Threshold Adjusting 7
2.2. TEST-COST SENSITIVE DECISION TREES 7
2.3. MISCLASSIFICATION-AND-TEST COST SENSITIVE DECISION TREES 8
2.4. COST-CONSTRAINED DECISION TREES 9
3. COST-EFFECTIVE CLASSIFICATION TREES UNDER A TIME CONSTRAINT 11
3.1. RESEARCH PROBLEM 11
3.2. PROBLEM DEFINITION 12
3.2.1. Training Data and Decision Tree 12
3.2.2. Attribute and Misclassification Cost, and Attribute Time 13
3.3. ALGORITHM 14
3.4. EXAMPLE 17
3.5. PERFORMANCE EVALUATION 22
3.6. SUMMARY 27
4. BUILDING A COST-CONSTRAINED DECISION TREE WITH MULTIPLE CONDITION
ATTRIBUTES 29
4.1. RESEARCH PROBLEM 29
4.2. PROBLEM DEFINITION 31
4.3. THE ALGORITHM 34
4.3.1. Multi-Dimensional Attribute (MDA) 35
4.3.2. Information Need and Distance 37
4.3.3 Multi-Dimensional Information Gain 39
4.3.4. Splitting Criterion 40
4.3.5. Label Assignment Test 42
4.3.6. Termination Condition 42
4.3.7. Updating the Bottom Nodes 43
4.4. PERFORMANCE EVALUATION 44
4.4.1. Performance Evaluation under Different Cost Constraints 46
4.4.2. Performance evaluation with different numbers of condition attributes 51
4.5. SUMMARY 54
5. COST-SENSITIVE DECISION TREE WITH MULTIPLE RESOURCE CONSTRAINTS 55
5.1. RESEARCH PROBLEM 55
5.2. PROBLEM DEFINITION 56
5.3. ALGORITHM 60
5.3.1 Phase1: Extract the Whole Classification Rules 62
5.3.2 Phase2: Build a Decision Tree with Extracted Rules 64
5.3.3. Phase3: Adjustment for Producing the Final Decision Tree 70
5.3.4. Time Complexity Analysis 73
5.4. PERFORMANCE EVALUATION 74
5.4.1. Performance Evaluation with Single Constrained Resource 77
5.4.2. Performance Evaluation with Two Constrained Resources 79
5.4.3. Performance Evaluation with Multiple Constrained Resources 80
5.5. SUMMARY 82
6. CONCLUSION 83
REFERENCE 85
[1]A. Arnt and S. Zilberstein, “Learning Policies for Sequential Time and Cost Sensitive Classification”, Proceedings of the 1st international workshop on Utility-based data mining, pp. 39-45, 2005.
[2]L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
[3]X. Chai, L. Deng, Q. Yang, and C. X. Ling, “Test-Cost Sensitive Naïve Bayesian Classification”, Proceeding of the 2004 IEEE International Conference on Data Mining 2004.
[4]P. Chan and S. Stolfo, “Toward Scalable Learning with Non-Uniform Class and Cost Distributions”, Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 164-168, New York, 1998.
[5]P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive.”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[6]C. Elkan, “The Foundations of Cost-Sensitive Learning”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973-978, Seattle, 2001.
[7]J. Gehrke, R. Ramakrishnan, and V. Ganti, “Rainforest: A framework for fast decision tree construction of large datasets”, Proceeding of the 1998 International Conference on Very Large Data Bases, pp. 416-427, 1998.
[8]J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Y. Loh, “BOAT-optimistic decision tree construction”, Proceeding of the 1999 ACM-SIGMOD International Conference. Management of Data, pp. 311-323, 1999.
[9]J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006.
[10]M T. Kai, “Inducing cost-sensitive trees via instance weighting”, Principles of Data Mining and Knowledge Discovery, Second European Symposium, pp. 23-26, Springer-Verlag, 1998.
[11]X. B. Li, “A scalable decision tree system and its application in pattern recognition and intrusion detection”, Decision Support Systems, 41, pp. 112-130, 2005.
[12]C.X. Ling, Q. Yang, J. Wang, and S. Zhang, “Decision Trees with Minimal Costs”, Proceedings of 2004 International Conference on Machine Learning, pp. 69, 2004.
[13]C. X. Ling, V. S. Sheng, and Q. Yang, “Test Strategies for Cost-Sensitive Decision Tree”, IEEE Transactions on Knowledge and Data Engineering, 18(8), pp. 1055-1067, 2006.
[14]C. X. Ling and V. S. Sheng, “Cost-Sensitive Learning and The Class Im-Balance Problem”, Encyclopedia of Machine Learning, Springer, 2008.
[15]W. Y. Loh and Y. S. Shih, “Split selection methods for classification trees”, Statistica Sinica, 7(4), pp. 815-840, 1997.
[16]M. Metha, J. Rissanen, and R. Agrawal, “MDL-based decision tree pruning”, Proceedings of the 1995 International Conference on knowledge Discovery and Data Mining (KDD’95), pp. 216-221, 1995.
[17]M. Metha, R. Agrawal, and J. Rissanen, “SLIQ: A fast scalable classifier for data mining”, Proceedings of the 5th International Conference on Extending Database Technology, 1996.
[18]A. Ni, X. Zhu, and C. Zhang, “Any-Cost Discovery: Learning Optimal Classification Rules”, Australian joint conference on artificial intelligence, 3809, pp. 123-132, Sydney, Australie, 2005.
[19]S. W. Norton, “Generating Better Decision Trees.” Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 800-805, Detroit, Michigan, 1989.
[20]M. Núñez, “The Use of Background Knowledge in Decision Tree Induction.” Machine Learning, 6, pp. 231-250, 1991.
[21]A. Papagelis and D. Kalles, “GATree: Genetically Evolved Decision Trees”, 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pp. 203-206, 2000.
[22]F. Provost, T. Fawcett, and R. Kohavi, “The Case Against Accuracy Estimation for Comparing Induction Algorithms”, Proc. 15th Intl. Conf. in Machine Learning, pp. 445-453, Madison, WI, 1998.
[23]Z. Qin, S. Zhang, and C. Zhang, “Cost-Sensitive Decision Trees with Multiple Cost Scales.” Australian joint conference on artificial intelligence, 3339, pp. 380-390, Cairns , Australie, 2004.
[24]J. R. Quinlan, “Induction of decision trees”, Machine Learning, 1, pp. 81-106, 1986.
[25]J. R. Quinlan, “Simplifying decision trees”, International Journal of Man-Machine Studies, 27, pp. 221-234, 1987.
[26]J. R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, Morgan Kaufmann, 1993.
[27]R. Rastogi and K. Shim, “Public: A decision tree classifier that integrates building and pruning”, Proceedings of the 1998 International Conference on Very Large Data Bases, pp. 404-415, 1998.
[28]J. Shafer, R. Agrawal, and M. Mehta. “SPRINT: A scalable parallel classifier for data mining”, Proceedings of 1996 International Conference on Very Large Data Bases, pp. 544-555, 1996.
[29]V. S. Sheng and C. X. Ling, “Feature Value Acquisition in Testing: A Sequential Batch Test Algorithm” Proceedings of the 23nd International Conference on Machine Learning, pp. 809-816, 2006.
[30]M. Tan and J. Schlimmer, “Cost-Sensitive Concept Learning of Sensor Use in Approach and Recognition”, Proceedings of the Sixth International Workshop on Machine Learning, pp. 392-395, Ithaca, New York, 1989.
[31]M. Tan, “Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics”, Machine Learning, 13, pp. 7-33, 1993.
[32]K. M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), pp. 659-665, 2002.
[33]P. D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of A Hybrid Genetic Decision Tree Induction Algorithm”, Journal of Artificial Intelligence Research, 2, pp. 369-409, 1995.
[34]P. D. Turney, “Types of Cost in Inductive Concept Learning”, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, pp. 15-21, 2000.
[35]Q. Yang, C. Ling, X. Chai, and R. Pan, “Test-Cost Sensitive Classification on Data with Missing Values”, IEEE Transactions on Knowledge and Data Engineering, 18(5), pp. 626-638, 2006.
[36]S. Zhang, Z. Qin, C. X. Ling, and S. Sheng, “ “Missing is Useful”: Mining Values in Cost-Sensitive Decision Trees”, IEEE Transactions on Knowledge and Data Engineering, 17(12), pp. 1689-1693, 2005.
[37]S. Zhang, X. Zhu, J. Zhang, and C. Zhang, “Cost-Time Sensitive Decision Tree with Missing Values”, KSEM 2007, pp. 447-459, 2007.
[38]H. Zhao, “A Multi-Objective Genetic Programming Approach to Developing Pareto Optimal Decision Trees”, Decision Support System, 43(3), pp. 809-826, 2007.
[39]H. Zhao, “Instance Weighting Versus Threshold Adjusting for Cost-Sensitive Classification.” Knowledge Information System, 15(3), pp. 321-334, 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top