跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.36) 您好!臺灣時間:2025/12/10 21:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:尹世泓
研究生(外文):Jones Sai-Wang Wan
論文名稱:基於群聚及統計測試的概念漂移檢測方法
論文名稱(外文):Concept Drift Detection Based on Pre-Clustering and Statistical Testing
指導教授:王勝德王勝德引用關係
指導教授(外文):Sheng-De Wang
口試日期:2017-07-21
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:24
中文關鍵詞:概念漂移漂移檢測
外文關鍵詞:concept driftstream data miningdrift detectionunsupervised
相關次數:
  • 被引用被引用:0
  • 點閱點閱:246
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
串流資料挖掘是現今應用中常見的數據挖掘方法之一。然而,由於現實世界中的串流資料的性質,特別是概念漂移,令它具有挑戰性。為了處理概念漂移,當數據標籤不可用時,漂移檢測方法是必要的。在本文中,我們提出了一種基於統計測試的漂移檢測方法,其中以群聚演算法分群作前處理,並透過主成分分析(PCA)進行特徵提取減少資料維度以縮短執行時間。在合成和真實串流資料集的實驗結果表明,群聚前處理提高了漂移檢測和特徵提取的性能,從而提高了檢測性能,並加快了執行時間。
Stream data mining is one of the common data mining methods in real-world applications nowadays. However, it is challenging due to the nature of data stream in real-world, especially concept drift. To handle concept drift, drift detection method is necessary when the accessing data label is unavailable. In this paper, we propose a drift detection method based on the statistical test with clustering as preprocessing and reduce the execution time with principal component analysis (PCA) for the feature extraction method. Experiment result on synthetic and real-world streaming data show the clustering preprocessing improve the performance of the drift detection and feature extraction trade-off an insignificant performance of detection for great speed up for the execution time.
Chapter 1 Introduction 1
Chapter 2 Related Work 4
2.1 Passive Solutions 4
2.2 Active Solutions 5
Chapter 3 Proposed Method 6
3.1 Feature Extraction 7
3.2 Clustering 8
3.3 Drift Detection 8
Chapter 4 Experimental Evaluation 12
4.1 Datasets 12
4.1.1 Synthetic datasets 12
4.1.2 Real-World datasets 13
4.2 Experiment 1. Performance of Preprocessing 13
4.3 Experiment 2. Performance of Proposed Method 15
4.4 Experiment 3. Execution time of Proposed Method 18
Chapter 5 Conclusion 20
REFERENCE 21
[1]A. Haque, L. Khan and M. Baron, Semi Supervised Adaptive Framework for Classifying Evolving Data Stream. Cham: Springer International Publishing, 2015, pp. 383-394. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-18032-8_30
[2]J. C. Schlimmer and R. H. Granger, “Incremental learning from noisy data,” Machine Learning, vol. 1, no. 3, pp. 317-354, 1986. [Online]. Available: http://dx.doi.org/10.1007/BG00116895
[3]J.a. Gama, I. Žliobaitė, A.Bifet, M. Pechenizkiy, and A. Bouchachia, “A Survey on Concept Drift Adaptation,” ACM Comput. Surv., vol. 46, no. 4, pp. 44:41-44:37, Mar. 2014. [Online]. Available: http://doi.acm.org/10.1145/2523813
[4]G. Widmer and M. Kubat, “Effective Learning in Dynamic Environments by Explicit Context Tracking,” in Proceedings of the European Conference on Machine Learning, ser. ECML ''93. London, UK, UK: Springer-Verlag, 1993, pp. 227-243. [Online]. Available: http://dl.acm.org/citation.cfm?id=645323.649587
[5]A. Bifet and R. Gavaldà, Learning from Time-Changing Data with Adaptive Windowing. SIAM, 2007, pp. 443-448. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/1.9781611972771.42
[6]P. Domingos and G. Hulten, “Mining high-speed data streams,” in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ''00. New York, NY, USA: ACM, 2000, pp. 71-80. [Online]. Available: http://doi.acm.org/10.1145/347090.347107
[7]J. Gama, P. Medas, G. Castillo and P. Rodrigues, Learning with Drift Detection. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 286-295. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-28645-5_29
[8]M. Baena-Garcia, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno, “Early drift detection method,” in Fourth International Workshop on Knowledge Discovery from Data Streams, 2006.
[9]K. Nishida and K. Yamauchi, Detecting Concept Drift Using Statistical Testing., Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 264-269. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-75488-6_27
[10]D. M. dos Reis, P. Flach, S. Matwin and G. Batista, “Fast Unsupervised Online Drift Detection Using Incremental Kolmogorov-Smirnov Test,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ''16. New York, NY, USA: ACM, 2016, pp. 1545-1554. [Online]. Available: http://doi.acm.org/10.1145/2939672.2939836
[11]D. V. Hinkley, “Inference About the Change-Point in a Sequence of Random Variables,” Biometrika, vol. 57, no. 1, pp. 1-17, 1970. [Online]. Available: http://www.jstor.org/stable/2334932
[12]H. Mouss, D. Mouss, N. Mouss and L. Sefouhi, “Test of Page-Hinckley, an approach for fault detection in an agro-alimentary production system,” in 2004 5th Asian Control Conference (IEEE Cat. No.04EX904), vol. 2, Jul 2004, pp. 815-818
[13]V. M. A. Souza, D. F. Silva, J. Gama and G. E. A. P. A. Batista, Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency. SIAM, 2015, pp. 873-881. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/1.978161197010.98
[14]Y. Sakamoto, K. I. Fukui, J. Gama, D. Nicklas, K. Moriyama and M. Numao, “Concept Drift Detection with Clustering via Statistical Change Detection Methods,” in 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), Oct 2015, pp. 37-42.
[15]L. I. Kuncheva and W. J. Faithfull, “PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 69-80, 2014.
[16]N. V. Chawla, K. W. Bowyer, L. O. Hall and W.P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Int. Res., vol. 16, no. 1, pp. 321-357, 2002. [Online]. Available: http://dl.acm.org/citation/cfm?id=1622407.1622416
[17]I. Tomek, “Two modifications of CNN, ” IEEE Transactions on Systems, Man, and, Cybernetics, vol. 6, pp. 769-772, Nov 1976.
[18]G. E. A. P. A. Batista, A. L. C. Bazzan and M.C. Monard, “Balancing Training Data for Automated Annotation of Keywords: a Case Study,” in WOB, 2003.
[19]W. N. Street and Y. Kim, “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proceedings of Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ''01. New York, NY, USA: ACM, 2001, pp. 377-382. [Online]. Available: http://doi.acm.org/10.1145/502512.502568
[20]W. Fan, “Systematic data selection to mine concept-drifting data streams,” in Proceedings of the Tenth ACM SIGKSS International Conference on Knowledge Discovery and Data Mining, ser. KDD ''04, New York, NY, USA: ACM, 2004, pp. 128-137. [Online]. Available: http://doi.acm.org/10.1145/1014052.1014069
[21]M. Harries, U. N. cse tr, and N. S. Wales, “SPLICE-2 Comparative Evaluation: Electricity Pricing,” The University of South Wales, Tech. Rep. 1999.
[22]J. A. Blackard, “UCI Machine Learning Repository: Covertype Data Set,” 1998. [Online]. Available: http://archive.ics.uci.edu/ml/datasets/covertype
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top