跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.36) 您好!臺灣時間:2025/12/10 21:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張全男
研究生(外文):Chuan-nan Chang
論文名稱:使用概念漂移偵測之分類法來探勘隨時間變動之資料流
論文名稱(外文):Classification of time--changing data streams based on concept drift detection
指導教授:邱宏彬邱宏彬引用關係
指導教授(外文):Hung-pin Chiu
學位類別:碩士
校院名稱:南華大學
系所名稱:資訊管理學研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:99
中文關鍵詞:資料流概念漂移分類
外文關鍵詞:Classificationdata streamsconcept driftconcept drift detectiondetection
相關次數:
  • 被引用被引用:1
  • 點閱點閱:408
  • 評分評分:
  • 下載下載:36
  • 收藏至我的研究室書目清單書目收藏:0
  本論文在探討資料流在隨時間變動產生概念漂移的環境下 (Data Stream)之分類(Classification)問題,由於這個連續成長的資料環境下存在著One-pass的限制使得我們無法回顧其歷史資料。目前已經有些可應用的演算法,但它們均針對在如何保留資料的時效性而言(意即對目前時間點最有意義),而忽略掉為了保留時效性而付出的嘗試錯誤的成本,與概念穩定時所浪費的維護成本。因此偵測概念漂移之分類法可用於避免上述問題;然而這方法卻因為偵測方法的限制使它在多類別資料偵測上可能導致一些效率上的問題,故我們在統計的基礎上提出一個以卡方檢定為偵測方法的演算法稱為卡方漂移偵測演算法CDDC(Concept Drift Detection of Chi-Square),(往後以CDC、CDDC交替使用)用以針對漂移修建的觀念將其「屬性值-類別-概念元」觀念更正為「屬性值-概念元」,並實驗該偵測評估方法的有效性;且將其漂移調整案例再作細分。實驗證明在多類別分類問題上能確實能降低因概念元比對所造成的不必要的維護成本,以避免隨類別增加而可能導致的調整成本增加的問題,以及調整案例粗糙所可能造成的成本,達成高速資料流環境下多類別分類概念漂移問題之可行方案。
  The present paper flows in the discussion material in changes as necessary produces under the concept drifting environment (DataStream) the classification the question. Because this continuously grows under the material environment has One-pass the limit to cause us to be unable to review its histor-icalmaterial. At present already some might the application develop the algorithm. How but do they aim at in retain the material the effectiveness for a period of time to say. But neglects for retain the attempt wrong cost which the effectiveness for a period of time pays, is stable with the concept when wastes maintenance cost. Detects classification of the Concept Drifting to be possible to avoid the above question. However this method actually because detects the method the limit to cause it detects in the multi-categories material on possibly cau-ses in some efficiency the question. Therefore we in the statistical foundation proposed as detects the method take the card side examination to develop the algorithm to be called the Chi-Square drifting to detect develops the algorithm. CDDC(Concept Drift Detection of Chi-S-quare). With take aims at the drifting construction the idea it "the attribute value-category-concept unit" the idea correction as "the attribute value-concept unit".
論文合格證明...........iv
誌謝...................v
中文摘要...............vi
英文摘要...............vii
目錄..................viii
  
第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的 2
第三節 研究流程 3
  
第二章、 文獻探討 5
第一節 分類(Classification) 5
第二節 資訊獲利(Information Gain) 9
第三節 資料流分類演算法 11
第四節 CDP-Tree演算法(Concept Drift Probing Tree) 14
第五節 統計檢定-卡方齊一性檢定 17
  
第三章、 研究方法 21
第一節 概念漂移形式(Concept Drift Type) 21
第二節 卡方檢定 與CDP-Tree檢定 26
第三節 概念漂移案例 31
第四節 CDC演算法/程式流程 47
第五節 CDC演算法實例 52
  
第四章、 實驗分析 68
第一節 實驗資料 68
第二節 實驗設計 71
第三節 兩類別資料之概念漂移偵測 72
第四節 四類別資料之概念漂移偵測 84
第五節 案例五(Case5)實驗數據分析 94
  
第五章、結論與未來發展 96
第一節 結論 96
第二節 未來發展 97
  
參考文獻      98
一、中文部份 98
二、西文部份 98
一、中文部份
[1]謝千慧, “一個適用於概念漂移資料串流探勘法之研究”,國立台南師範學院,碩士論文,2004。
二、西文部份
  
[2] J. R. Quinlan, 1993 “C4.5: Program for Machine Learning,” Morgen Kaufmann Publisher, San Mateo, Ca.
  
[3] Domingos P. and Hulten G. (2000) Mining High-Speed Data Streams. In Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining
  
[4] Hulten G., Spencer L., and Domingos P. (2001) Mining Time-Changing Data Streams. ACM SIGKDD Conference.
  
[5] Wang H., Fan W. Yu P. and Han J. (2003) Mining Concept-Drifting Data Streams using Ensemble Classifiers, in the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA.
  
[6] Fan W.(2004) Systematic data selection to mine concept-drifting data streams. ACM KDD Conference, pp. 128.137.
  
[7] Aggarwal C., Han J., WANG j., Yu P. S., (2003) A Framework for Clustering Evolving Data Streams, Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB ’03), Berlin, Germany, Sept. 2003.
  
[8] Aggarwal C., Han J., Wang J., Yu P. S.,(2004) On Demand Classification of Data Streams, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD’04), Seattle, WA.
  
[9] Last M. (2002) Online Classification of Nonstationary Data Streams, Intelligent Data Analysis, Vol. 6, No. 2, pp. 129-147.
  
[10] Law Y., Zaniolo C. (2005) An Adaptive Nearest Neighbor Classification Algorithm for Data Streams, Proceedings of the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases, springer Verlag, Porto, Portugal.
  
[11] Ferrer-Troyano F. J., Aguilar-Ruiz J. S. and Riquelme J. C. (2004) Discovering Dceision Rules from Numerical Data Streams, ACM Symposium on Applied Computing, pp. 649-653.
  
[12] Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., (2005). On-board Mining of Data Streams in Sensor Networks, Accepted as a chapter in the forthcoming book Advance Methods of Knowledge Disvcovery from complex Data,(Eds.) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane cook, Springer Verlag, to appear.
  
[13] G. Hulten, L. Spencer, and P. Ddmingos, “Mining Time-Changing Data Streams, ” In Proc. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA., pp. 97-106, Aug. 2001.
  
[14] R. Klinkenberg and I. Renz, “ Adaptive Information Filtering:Learning in The Presence of Concept Drifts, ”In M. Sahami, M. Craven, T. Joachims, and A. McCallum, editors, Workshop Notes of the ICML-98 Workshop on Learning for Text Categorization, pp.33-40, Menlo Park, CA., AAAI Press, 1998.
  
[15] J. R. Quinlan, “Learning Efficient Classification Procedures and Their Application to Chess End Games, ” Machine Learning : An Artificial Intelligence Approach , Michalski et. All (EDS), Tioga Publishing, Palo Alto, 1983.
  
[16] J. R. Quinlan, “Induction of Decision Trees, ” Machine Learning, Vol. 1, No. 1, pp. 81-106, 1986.
  
[17] J. R. Quinlan, “C4.5:Program for Machine Learning, ” Morgen Kaufmann Publisher, San Mateo, CA, 1993.
  
[18] P. Domingos and G. Hulten, “mining High-Speed Data Streams, ”In Proc. Association for Computing Machinery 6th International Conference on Knowledge Discovery and Data Mining, Bostion, MA., pp. 71-80, Aug. 2000.
  
[19] Quinlan, J.R., 1986. Induction of Decision Trees. Machine Learning, 1, 1, pp.81-106
  
[20] Lewis, R.J., M.D., Ph.D., 2000. An Introduction to Classification and Regression Tree (CART) Analysis. The Annual Meeting of the Society for Academic Emergency Medicine, Francisco, California.
  
[21] Hand D.J., Mannila H., and Smyth P. (2001) Principles of data mining, MIT Press.
  
[22]Hastie T., Tibshirani R., Friedman J. (2001) The elements of statistical learning: data mining, inference, and prediction, New York: Springer.
  
[23]M. Maloof, “Incremental Rule Learning with Partial Instance Memory for Changing Concepts, ”In Proc.s of the international Joint Conference on Neural Networks, Los alamitos, CA: IEEE Press, Jul. 2003.
  
[24]G. Widmer and M. Kubat, “Learning in The Presence of Concept Drift and Hidden Contexts, ” Machine Learning, Vol. 23, No. 1, pp. 69-101, 1996.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊