(3.238.186.43) 您好!臺灣時間:2021/02/25 02:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:謝千慧
研究生(外文):Chien-Hui Hsieh
論文名稱:一個適用於概念漂移資料串流探勘法之研究
論文名稱(外文):An Efficient Approach for Mining Concept-Drifting Data Streams
指導教授:李建億李建億引用關係
指導教授(外文):Chien-I Lee
學位類別:碩士
校院名稱:國立臺南大學
系所名稱:資訊教育研究所碩士班
學門:教育學門
學類:教育科技學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:中文
論文頁數:50
中文關鍵詞:概念漂移漸進式學習資料串流資料探勘
外文關鍵詞:Incremental LearningData StreamsData MiningConcept Drift
相關次數:
  • 被引用被引用:3
  • 點閱點閱:312
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:25
  • 收藏至我的研究室書目清單書目收藏:0
探勘資料串流﹙Data Streams﹚已成為知識發掘的一項新興議題,大部份的探勘學習法假設資料是以平穩分佈(Stationary Distribution)的方式呈現,然而現今釵h大型資料庫卻違反這種假設。隨著時間改變,資料極有可能產生概念漂移﹙Concept Drift﹚的現象。雖然已有釵h學習法利用視窗技術﹙Window﹚來找出概念漂移之現象並產生適當的預測模組,但當概念穩定時,視窗的運用會浪費釵h不必要的系統成本,即使概念發生漂移,該技術亦無法事先預知,只能在預測模組之正確率大幅下降時才進行修正。為了解決這個問題,本論文提出一個概念漂移探測演算法,稱之為CDP-Tree﹙Concept Drift Probing Tree﹚。CDP-Tree能事先察覺概念漂移的發生,因此不但能節省概念穩定時不必要的測試成本,亦能提升概念漂移時預測模組重建的效率和分類正確率。
Mining data streams has become a research topic of growing interest in knowledge discovery. Most learning algorithms assume that the data is a random sample from a stationary distribution, however, most of the large databases available for mining today violate this assumption. The target class may change over time, known as concept drift. Many proposed algorithms try to recognize concept drift by maintaining fixed or automatically adapted “window” on the training data. However, such methods spend a lot unnecessary computational cost while the concept is stationary. Moreover, while the concept drifts, window-based algorithms would not correct previously produced classifier until the accuracy of this classifier drastically dropped to a pre-defined threshold and therefore make them unsuitable for a real-time application. In this thesis, we propose a concept drift probing tree algorithm, called CDP-Tree, to detect the drifting of concept in advance. It not only avoids unnecessary computational cost for stable data streams, but also efficiently rebuilds classifier to improve its predicted accuracy while data streams are instable.
中文摘要 I
英文摘要 II
誌謝 III
目 次 IV
表 次 V
圖 次 VI
第一章 緒論 1
第一節 研究背景 1
第二節 研究動機 3
第三節 研究流程 4
第四節 論文的架構與內容 4
第二章 文獻探討 5
第一節 決策規則學習法 5
第二節 漸進式決策規則學習法 8
第三節 概念漂移學習法 10
第三章 概念漂移探測決策樹 15
第一節 概念穩定、概念漂移與概念轉移 15
第二節 概念漂移之預測 18
第三節 概念元 19
第四節 概念漂移探測決策樹 27
第五節 參數調整 37
第四章 效能評估 38
第一節 Mushroom資料集 38
第二節 The STAGGER Concepts資料集 39
第五章 結論與未來研究方向 44
第一節 結論 44
第二節 未來研究方向 44
參考文獻 46
[1] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, “Classification and Regression Trees,” Chapman & Hall, New York, 1984.
[2] W. Cohen, “Fast Effective Rule Induction,” In Proc. 12th International Conference on Machine Learning, Tahoe City, CA., pp. 115–123, 1995, San Francisco, CA: Morgan Kaufmann, 1995,
[3] W. Cohen, “Learning Rules That Classify E-Mail,” In Machine Learning in Information Access: Papers from the 1996 AAAI Spring Symposium, Menlo Park, CA: AAAI Press, pp. 18-25, 1996, Technical Report SS96-05.
[4] P. Clark and T. Niblett, “The CN2 Induction Algorithm,” Machine Learning, Vol. 3, No. 4, pp. 261–283, 1989.
[5] P. Domingos and G. Hulten, “Mining High-Speed Data Streams,” In Proc. Association for Computing Machinery 6th International Conference on Knowledge Discovery and Data Mining, Boston, MA., pp. 71-80, Aug. 2000.
[6] J. Furnkranz and G. Widmer, “Incremental Reduced Error Pruning,” In Proc. of the11th International Conference on Machine Learning, San Francisco, CA: Morgan Kaufmann, pp. 70–77, 1994.
[7] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann Publisher, 2001.
[8] G. Hulten, L. Spencer, and P. Ddmingos, “Mining Time-Changing Data Streams,” In Proc. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA., pp. 97-106, Aug. 2001.
[9] M. B. Harries, C. Sammut and K. Horn, “Extracting Hidden Context,” Machine Learning, Vol. 32, No. 2, pp. 101-126, 1998.
[10] R. Jin and G. Agrawa, “Efficient Decision Tree Construction on Streaming Data,” In Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 571-576, Aug. 2003.
[11] I. Koychev, “Gradual Forgetting for Adaptation to Concept Drift,” In Proc. of ECAI 2000 Workshop Current Issues in Spatio-Temporal Reasoning, Berlin, Germany, 2000.
[12] R. Klinkenberg, “Using Labeled and Unlabeled Data to Learn Drifting Concepts,” In Miroslav Kubat and Katharina Morik, editors, Workshop notes of the IJCAI-01 Workshop on Learning from Temporal and Spatial Data, pages 16-24, Menlo Park, CA., 2001. IJCAI, AAAI Press. Held in Conjunction with the International Joint Conference on Artificial Intelligence (IJCAI), 15.
[13] J.Z. Kolter and M. A. Maloof, “Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift,” In Proc. of the 3rd International IEEE Conference on Data Mining, Melbourne, FL., pp. 123-130, Dec. 2003.
[14] A. Kuh, T. Petsche, and R. L. Rivest, “Learning Time-Varying Concepts,” In Advances in Neural Information Processing Systems 3, Denver, Co., pp. 183-189, Nov. 1990, San Francisco, CA: Morgan Kaufmann, Vol. 3, pp. 183-189, 1991.
[15] R. Klinkenberg and I. Renz, “Adaptive Information Filtering: Learning in The Presence of Concept Drifts,” In M. Sahami, M. Craven, T. Joachims, and A. McCallum, editors, Workshop Notes of the ICML-98 Workshop on Learning for Text Categorization, pp. 33–40, Menlo Park, CA., AAAI Press, 1998.
[16] R. Kohavi and M. Sahami, “Error-Based and Entropy-Based Discretization of Continuous Features,” In Proc. 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR., pp. 114-119, 1996.
[17] T. Lane and C. E. Brodley, “Approaches to Online Learning and Concept Drift for User Identification in Computer Security,” In Proc. 4th International Conference on Knowledge Discovery and Data Mining, New York City, NY., pp. 259-263, Aug. 1998.
[18] R. S. Michalski, “On the Quasi-Minimal Solution of The General Covering Problem,” In Proc. 5th International Symposium on Information Processing, Bled, Yugoslavia, Vol. A3, pp. 125–128, 1969.
[19] T. M. Mitchell, “Machine Learning and Data Mining,” Communications of the ACM, Vol. 42, No. 11, pp. 30-36, Nov. 1999.
[20] M. Maloof, “Incremental Rule Learning with Partial Instance Memory for Changing Concepts,” In Proc.s of the International Joint Conference on Neural Networks, Los Alamitos, CA: IEEE Press, Jul. 2003.
[21] P. M. Murphy and D. W. Aha, UCI Repository of Machine Learning Database, In [http://www.cs.uci.edu/mlearn/mlrepository.html].
[22] R. S. Michalski and J. Larson, “Incremental Generation of VL1 Hypothesis: The Underlying Methodology and The Description of Program AQ11,” Department of computer Science, University of Illinois, Urbana, Technical Report UIUCDCS-F-83-905, 1983.
[23] O. Maron and A. Moore, “Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation,” In J. D. Cowan, G. Tesauro, and J. lspector, editors, Advances in Neural Information Processing Systems 6, Denver, Co., pp. 59-66, 1993, San Mateo, CA: Morgan Kaufmann, 1994.
[24] M. A. Maloof and R. S. Michalski, “Selecting Examples for Partial Memory Learning,” Machine Learning, Vol. 41, No. 1, pp. 27-52, 2000.
[25] M. A. Maloof and R. S. Michalski, “Incremental Learning with Partial Instance Memory,” In Foundations of intelligent systems, Lecture Notes in Artificial Intelligence, Vol. 2366, 16-27. Berlin: Springer-Verlag. (Proc. 13th International Symposium on Methodologies for Intelligent Systems, Lyon, France, June 27-29), 2002.
[26] J. R. Quinlan, “Learning Efficient Classification Procedures and Their Application to Chess End Games,” Machine Learning : An Artificial Intelligence Approach , Michalski et. all (Eds), Tioga Publishing, Palo Alto, 1983.
[27] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986.
[28] J. R. Quinlan, “C4.5: Program for Machine Learning,” Morgen Kaufmann Publisher, San Mateo, CA, 1993.
[29] J. C. Schlimmer and D. H. Fisher, “A Case Study of Incremental Concept Induction,” In Proc. of the 5th International Conference on Artificial Intelligence, Philadelphia, PA., pp.496-501, Aug. 1986. Morgan Kaufmann, 1986.
[30] J. Schlimmer and R. Granger, “Beyond Incremental Processing: Tracking Concept Drift,” In Proc. 5th National Conference on Artificial Intelligence, Philadelphia, PA., pp. 502-507, Aug. 1986. Morgan Kaufmann, Vol. 1, 1986.
[31] W. Street and Y. Kim, “A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification,” In Proc. 7th International Conference on Knowledge Discovery and Data Mining. New York City, NY., pp. 377-382, Aug. 2001.
[32] C. Taylor, G. Nakhaeizadeh, and C. Lanquillon, “Structural Change and Classification,” in Workshop Notes on Dynamically Changing Domains: Theory Revision and Context Dependence Issues, 9th European Conference on Machine Learning (ECML’97), Prague, Czech Republic, pp. 67–78, 1997.
[33] P. E. Utgoff, (1989b), “Incremental Induction of Decision Trees,” Machine Learning, Vol. 4, Issue 2, pp. 161-186, 1989.
[34] P. E. Utgoff, “An Improved Algorithm for Incremental Induction of Decision Trees, In Proc. 11th International Conference on Machine Learning, New Brunswick, NJ., pp. 318-325, Jul. 1994.
[35] P. Utgoff, N. Berkman, and J. Clouse, “Decision Tree Induction Based on Efficient Tree Restructuring,” Machine Learning, Vol. 29, No1, pp. 5-44, 1997.
[36] H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining Concept-Drifting Data Streams Using Ensemble Classifiers”, in Proc. 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 226-235, Aug. 2003.
[37] G. Widmer and M. Kubat, “Learning in The Presence of Concept Drift and Hidden Contexts,” Machine Learning, Vol. 23, No. 1, pp. 69-101, 1996.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔