臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.188) 您好！臺灣時間：2025/10/07 19:37

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

吳家齊

研究生(外文):

Chia-Chi Wu

論文名稱:

資源有限下的決策樹建構

論文名稱(外文):

Resource-Constrained Decision Tree Induction

指導教授:

陳彥良

指導教授(外文):

Yen-Liang Chen

學位類別:

博士

校院名稱:

國立中央大學

系所名稱:

資訊管理研究所

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2010

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

決策樹、資料探勘、分類、成本感知學習

外文關鍵詞:

data mining、cost-sensitive learning、decision tree、classification

相關次數:

被引用:0
點閱:382
評分:
下載:0
書目收藏:1

分類是資料探勘中一個非常重要的研究領域。在現存的許多分類器當中，決策樹可能是最受歡迎、也最常被使用的分類模型。現有的大多數決策樹演算法皆致力於將分類精確度最大化、將分類錯誤率最小化。然而，在許多現實生活應用中，從以現有資料建立決策樹，到用決策樹分類未來資料的每個過程，都可能包含了各式各樣不同種類的成本或資源消耗。依據我們所面對的問題，我們也有可能需要在有限的資源底下完成分類工作。因此，如何在資源有限下建立出最適用的決策樹是一個很重要的議題。在本研究中，我們首先提出了兩個改良自傳統TDIDT﹝Top-Down Induction on Decision Trees, 由上而下的決策樹建構﹞的演算法。接著，我們採用了一個全新的方法來處理多種資源限制的問題。我們所提出的新方法先從訓練資料集中粹取出所有合法的分類規則，再利用這些粹取出的規則建出一棵決策樹。我們使用實際資料來進行完整的實驗評估。實驗結果顯示，我們提出的方法在不同資源限制下的表現都是令人滿意的。

Classification is one of the most important research domains in data mining. Among the existing classifiers, decision trees are probably the most popular and commonly-used classification models. Most of the decision tree algorithms aimed to maximize the classification accuracy and minimize the classification error. However, in many real-world applications, there are various types of cost or resource consumption involved in both the induction of decision tree and the classification of future instance. Furthermore, the problem we face may require us to complete a classification task with limited resource. Therefore, how to build an optimum decision tree with resource constraint becomes an important issue. In this study, we first propose two algorithms which are improved versions of traditional TDIDT(Top-Down Induction on Decision Trees) algorithms. Then, we adopt a brand new approach to deal with multiple resource constraints. This approach extracts association classification rules from training dataset first, and then builds a decision tree from the extracted rules. Empirical evaluations were carried out using real datasets, and the results indicated that the proposed methods can achieve satisfactory results in handling data under different resource constraints.

中文摘要 I
ABSTACT II
誌謝 III
1. INTRODUCTION 1
1.1. MOTIVATIONS AND RESEARCH OBJECTIVES 2
1.2. ORGANIZATION OF THE DISSERTATION 4
2. LITERATURE REVIEW 5
2.1. MISCLASSIFICATION-COST SENSITIVE DECISION TREES 5
2.1.1 Instance Weighting 6
2.1.2 Threshold Adjusting 7
2.2. TEST-COST SENSITIVE DECISION TREES 7
2.3. MISCLASSIFICATION-AND-TEST COST SENSITIVE DECISION TREES 8
2.4. COST-CONSTRAINED DECISION TREES 9
3. COST-EFFECTIVE CLASSIFICATION TREES UNDER A TIME CONSTRAINT 11
3.1. RESEARCH PROBLEM 11
3.2. PROBLEM DEFINITION 12
3.2.1. Training Data and Decision Tree 12
3.2.2. Attribute and Misclassification Cost, and Attribute Time 13
3.3. ALGORITHM 14
3.4. EXAMPLE 17
3.5. PERFORMANCE EVALUATION 22
3.6. SUMMARY 27
4. BUILDING A COST-CONSTRAINED DECISION TREE WITH MULTIPLE CONDITION
ATTRIBUTES 29
4.1. RESEARCH PROBLEM 29
4.2. PROBLEM DEFINITION 31
4.3. THE ALGORITHM 34
4.3.1. Multi-Dimensional Attribute (MDA) 35
4.3.2. Information Need and Distance 37
4.3.3 Multi-Dimensional Information Gain 39
4.3.4. Splitting Criterion 40
4.3.5. Label Assignment Test 42
4.3.6. Termination Condition 42
4.3.7. Updating the Bottom Nodes 43
4.4. PERFORMANCE EVALUATION 44
4.4.1. Performance Evaluation under Different Cost Constraints 46
4.4.2. Performance evaluation with different numbers of condition attributes 51
4.5. SUMMARY 54
5. COST-SENSITIVE DECISION TREE WITH MULTIPLE RESOURCE CONSTRAINTS 55
5.1. RESEARCH PROBLEM 55
5.2. PROBLEM DEFINITION 56
5.3. ALGORITHM 60
5.3.1 Phase1: Extract the Whole Classification Rules 62
5.3.2 Phase2: Build a Decision Tree with Extracted Rules 64
5.3.3. Phase3: Adjustment for Producing the Final Decision Tree 70
5.3.4. Time Complexity Analysis 73
5.4. PERFORMANCE EVALUATION 74
5.4.1. Performance Evaluation with Single Constrained Resource 77
5.4.2. Performance Evaluation with Two Constrained Resources 79
5.4.3. Performance Evaluation with Multiple Constrained Resources 80
5.5. SUMMARY 82
6. CONCLUSION 83
REFERENCE 85

[1]A. Arnt and S. Zilberstein, “Learning Policies for Sequential Time and Cost Sensitive Classification”, Proceedings of the 1st international workshop on Utility-based data mining, pp. 39-45, 2005.
[2]L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
[3]X. Chai, L. Deng, Q. Yang, and C. X. Ling, “Test-Cost Sensitive Naïve Bayesian Classification”, Proceeding of the 2004 IEEE International Conference on Data Mining 2004.
[4]P. Chan and S. Stolfo, “Toward Scalable Learning with Non-Uniform Class and Cost Distributions”, Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 164-168, New York, 1998.
[5]P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive.”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[6]C. Elkan, “The Foundations of Cost-Sensitive Learning”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973-978, Seattle, 2001.
[7]J. Gehrke, R. Ramakrishnan, and V. Ganti, “Rainforest: A framework for fast decision tree construction of large datasets”, Proceeding of the 1998 International Conference on Very Large Data Bases, pp. 416-427, 1998.
[8]J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Y. Loh, “BOAT-optimistic decision tree construction”, Proceeding of the 1999 ACM-SIGMOD International Conference. Management of Data, pp. 311-323, 1999.
[9]J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006.
[10]M T. Kai, “Inducing cost-sensitive trees via instance weighting”, Principles of Data Mining and Knowledge Discovery, Second European Symposium, pp. 23-26, Springer-Verlag, 1998.
[11]X. B. Li, “A scalable decision tree system and its application in pattern recognition and intrusion detection”, Decision Support Systems, 41, pp. 112-130, 2005.
[12]C.X. Ling, Q. Yang, J. Wang, and S. Zhang, “Decision Trees with Minimal Costs”, Proceedings of 2004 International Conference on Machine Learning, pp. 69, 2004.
[13]C. X. Ling, V. S. Sheng, and Q. Yang, “Test Strategies for Cost-Sensitive Decision Tree”, IEEE Transactions on Knowledge and Data Engineering, 18(8), pp. 1055-1067, 2006.
[14]C. X. Ling and V. S. Sheng, “Cost-Sensitive Learning and The Class Im-Balance Problem”, Encyclopedia of Machine Learning, Springer, 2008.
[15]W. Y. Loh and Y. S. Shih, “Split selection methods for classification trees”, Statistica Sinica, 7(4), pp. 815-840, 1997.
[16]M. Metha, J. Rissanen, and R. Agrawal, “MDL-based decision tree pruning”, Proceedings of the 1995 International Conference on knowledge Discovery and Data Mining (KDD’95), pp. 216-221, 1995.
[17]M. Metha, R. Agrawal, and J. Rissanen, “SLIQ: A fast scalable classifier for data mining”, Proceedings of the 5th International Conference on Extending Database Technology, 1996.
[18]A. Ni, X. Zhu, and C. Zhang, “Any-Cost Discovery: Learning Optimal Classification Rules”, Australian joint conference on artificial intelligence, 3809, pp. 123-132, Sydney, Australie, 2005.
[19]S. W. Norton, “Generating Better Decision Trees.” Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 800-805, Detroit, Michigan, 1989.
[20]M. Núñez, “The Use of Background Knowledge in Decision Tree Induction.” Machine Learning, 6, pp. 231-250, 1991.
[21]A. Papagelis and D. Kalles, “GATree: Genetically Evolved Decision Trees”, 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pp. 203-206, 2000.
[22]F. Provost, T. Fawcett, and R. Kohavi, “The Case Against Accuracy Estimation for Comparing Induction Algorithms”, Proc. 15th Intl. Conf. in Machine Learning, pp. 445-453, Madison, WI, 1998.
[23]Z. Qin, S. Zhang, and C. Zhang, “Cost-Sensitive Decision Trees with Multiple Cost Scales.” Australian joint conference on artificial intelligence, 3339, pp. 380-390, Cairns , Australie, 2004.
[24]J. R. Quinlan, “Induction of decision trees”, Machine Learning, 1, pp. 81-106, 1986.
[25]J. R. Quinlan, “Simplifying decision trees”, International Journal of Man-Machine Studies, 27, pp. 221-234, 1987.
[26]J. R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, Morgan Kaufmann, 1993.
[27]R. Rastogi and K. Shim, “Public: A decision tree classifier that integrates building and pruning”, Proceedings of the 1998 International Conference on Very Large Data Bases, pp. 404-415, 1998.
[28]J. Shafer, R. Agrawal, and M. Mehta. “SPRINT: A scalable parallel classifier for data mining”, Proceedings of 1996 International Conference on Very Large Data Bases, pp. 544-555, 1996.
[29]V. S. Sheng and C. X. Ling, “Feature Value Acquisition in Testing: A Sequential Batch Test Algorithm” Proceedings of the 23nd International Conference on Machine Learning, pp. 809-816, 2006.
[30]M. Tan and J. Schlimmer, “Cost-Sensitive Concept Learning of Sensor Use in Approach and Recognition”, Proceedings of the Sixth International Workshop on Machine Learning, pp. 392-395, Ithaca, New York, 1989.
[31]M. Tan, “Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics”, Machine Learning, 13, pp. 7-33, 1993.
[32]K. M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), pp. 659-665, 2002.
[33]P. D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of A Hybrid Genetic Decision Tree Induction Algorithm”, Journal of Artificial Intelligence Research, 2, pp. 369-409, 1995.
[34]P. D. Turney, “Types of Cost in Inductive Concept Learning”, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, pp. 15-21, 2000.
[35]Q. Yang, C. Ling, X. Chai, and R. Pan, “Test-Cost Sensitive Classification on Data with Missing Values”, IEEE Transactions on Knowledge and Data Engineering, 18(5), pp. 626-638, 2006.
[36]S. Zhang, Z. Qin, C. X. Ling, and S. Sheng, “ “Missing is Useful”: Mining Values in Cost-Sensitive Decision Trees”, IEEE Transactions on Knowledge and Data Engineering, 17(12), pp. 1689-1693, 2005.
[37]S. Zhang, X. Zhu, J. Zhang, and C. Zhang, “Cost-Time Sensitive Decision Tree with Missing Values”, KSEM 2007, pp. 447-459, 2007.
[38]H. Zhao, “A Multi-Objective Genetic Programming Approach to Developing Pareto Optimal Decision Trees”, Decision Support System, 43(3), pp. 809-826, 2007.
[39]H. Zhao, “Instance Weighting Versus Threshold Adjusting for Cost-Sensitive Classification.” Knowledge Information System, 15(3), pp. 321-334, 2008.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	資料探索在醫學資料庫之應用
2.	資料探勘應用於財務危機預警模式之研究
3.	資料探勘應用於醫療院所輔助病患看診指引之研究
4.	資料採礦技術於銀行授信之應用---以消費者貸款為例
5.	應用資料探採於我國西藥行銷之研究
6.	利用資料倉儲與資料探勘技術於招生策略與學生特質分析之研究
7.	資料探勘在慢性病預測模式之建構
8.	應用決策樹歸納法探討台灣行動電話市場區隔
9.	應用資料探勘於信用卡授信決策模式之實證研究
10.	資料挖掘與統計方法應用於資料庫行銷之實證研究─以美妝保養品業為例
11.	資料探勘應用於銀行業信用卡持卡者、帳戶區隔與零售業酒品交叉銷售之探勘研究
12.	應用資料探勘技術分析學生選課特性與學業表現
13.	資料探勘在財務領域的運用-以債券型基金之績效評估為例

1.	張曉春，「專業人員工作疲乏研究模式—以社會工作員為例」，「思與言」。1983年。

1.	MiningMulti-DimensionRulesinMultipleDatabaseSegmentation-onExamplesofCrossSelling
2.	Mining typical transactions from transaction databases
3.	A template approach for summarizing restaurant reviews
4.	跨平台推薦系統，基於Facebook使用者特質推薦Instagram熱門帳號
5.	強化使用者和電影評分關係，打造 User profile 之電影推薦系統
6.	以Ensemble Model改善鏈結預測準確度
7.	多時間區間K-means
8.	在成本限制下，以屬性導向歸納法為基礎歸納資料
9.	Mining Conflict Patterns
10.	新竹縣「客家聚落」的歷史變遷--以竹東鎮軟橋里為例
11.	5-kW級聚光型太陽追蹤器結構變形與追日偏差分析
12.	雙屬性集合之空間分群演算法-應用於地理資料
13.	以兩階段分類方法識別新聞類別
14.	運用資料探勘技術建構國小高年級學生學業成就之預測模式
15.	應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室