跳到主要內容

臺灣博碩士論文加值系統

(98.84.18.52) 您好!臺灣時間:2024/10/14 03:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃聖坤
研究生(外文):Sheng-Kun Hwang
論文名稱:在變動的支持度中有效率的進行資料串流高頻項目集探勘之研究
論文名稱(外文):Efficient Mining of Frequent Patterns in Data Streams with Variable Support Thresholds
指導教授:林明言
指導教授(外文):Ming-Yen Lin
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:74
中文關鍵詞:高頻項目集摘要向量可變動支持度探勘資料串流
外文關鍵詞:frequent itemsetdata streamsynopsis vectorvariable support mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:185
  • 評分評分:
  • 下載下載:22
  • 收藏至我的研究室書目清單書目收藏:0
最近幾年中,在資料串流裡探勘高頻項目集是一個相當新穎的研究問題。過去的解決方法,大部分都是使用一個固定不變的最小門檻值,從資料串流中找尋高頻項目集。然而,在真實環境中,門檻值應該是要能配合使用者的需求以及資料特性而改變。在非串流的環境中,一旦改變了門檻值即代表需要對全部的交易做重新探勘的動作。然而,由於資料串流具有「只能掃描一次」的特性,它無法提供已丟棄的交易資料,因此在資料串流中欲進行重新探勘通常是無法完成的。
本論文中,我們提出了一個可處理變動支持門檻值的探勘資料串流高頻項目集的演算法,稱為VSMDS。我們利用一個摘要向量結構來高度壓縮並儲存過去的交易資料,並且僅在需要時再提出來使用。VSMDS可以配合使用者隨時變動的門檻值,找出從開始到目前為止的高頻項目集。此外,我們提出了一個具有傾斜摘要向量結構的演算法VSMTP,以處理當使用者在變動門檻值的同時,要求任意時間區間的高頻項目集的查詢。我們針對各種資料庫進行完整的實驗。實驗結果顯示我們所提出的方法在探勘資料串流中具有處理變動支持門檻值的能力並具高效率。
Mining frequent itemsets over data streams is an emergent research topic in recent years. Previous approaches generally use a fixed minimum support threshold to discover the patterns in the stream. However, the support threshold will be changed to cope with the needs of the users and the characteristics of the incoming data in reality. Changing the threshold implies a re-mining of the whole transactions in a non-streaming environment. Nevertheless, the "look-once" feature of the streaming data cannot provide the discarded transactions so that a re-mining on the stream is impossible.
In this thesis, we propose a method for variable support mining of frequent itemsets over the data stream, called VSMDS. A synopsis vector is constructed for maintaining statistics of past transactions and is invoked only when necessary. The VSMDS algorithm can find up-to-date frequent itemsets with the changeable support threshold. In addition, we propose algorithm VSMTP which has tilted synopsis vectors, to answer queries for frequent itemsets in any period of time with respect to variable support thresholds. We have conducted extensive and comprehensive experiments over various datasets. The experimental results show that both algorithms are efficient and capable of mining frequent itemsets over the data stream, with variable support thresholds.
誌 謝 i
中文摘要 ii
Abstract iii
Table of Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Research Objectives 3
1.4 Organization of the Thesis 4
Chapter 2 Related Work 5
2.1 Frequent Itemset Mining 5
2.1.1 Apriori Algorithm 5
2.1.2 FP-growth Algorithm 7
2.2 Data Stream Systems 10
2.3 Frequent Itemset Mining in Data Stream 11
2.3.1 Buffer-Trie-SetGen Algorithm 11
2.3.2 FP-stream Algorithm 12
2.3.3 DSM-FI Algorithm 13
2.4 Closed/Recent Itemset Mining in Data Stream 15
2.4.1 Moment Algorithm 15
2.4.2 estWin Algorithm 16
Chapter 3 VSMDS Algorithm 18
3.1 Problem Statement 18
3.2 Overview of the VSMDS Algorithm 20
3.3 Data Structures used in VSMDS Algorithm 22
3.4 The VSMDS Algorithm 26
3.5 VSMDS Algorithm: A running Example 29
3.6 Experimental Results 30
3.6.1 Synthetic DataSets 31
3.6.2 Real World DataSets 40
3.7 Summary 44
Chapter 4 VSMTP Algorithm 45
4.1 Problem Statement 45
4.2 Overview of the Proposed Algorithm 46
4.3 Data Structures used in VSMTP Algorithm 48
4.4 The VSMTP Algorithm 52
4.5 Dynamic Compression Technique 53
4.6 Experimental Evaluations 55
4.7 Summary 58
Chapter 5 Conclusion and Future Work 59
5.1 Contributions 59
5.2 Future Work 59
References 61
[1] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207-216, Washington D.C., May 1993.
[2] R. Agrawal and R. Srikant, "Fast Algorithm for Mining Association Rules," Proceedings of the 20th International Conference on Very Large Databases (VLDB’94), pp. 487-499, 1994.
[3] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in data stream systems," Proceedings of the 2002 ACM Symposium on Principles of Database Systems (PODS’02), 2002.
[4] J. H. Chang, and W. S. Lee, "Online Data Stream Mining of Recent Frequent Itemsets by Sliding Window Method," Journal of Information Science, Vol. 31, Issue 2, pp. 76-90, Apr. 2005.
[5] M. Charikar, K. Chen, and M. Farach-Colton, "Finding Frequent Items in Data Stream," Proceedings of the 29th International Colloquium on Automata, Languages and Programming, pp. 693-703, Malaga, Spain, Jul. 2002.
[6] Y. Chi, and H. Wang, "Maintaining Closed Frequent Itemsets over a Stream Sliding Window," Proceedings of the 4th IEEE International Conference on Data Mining (ICDM''04), pp. 59-66, Brighton, United Kingdom, Nov. 01-04 2004.
[7] G. Cormode, S. Muthukrishnan, "What''s Hot and What''s Not: Tracking Most Frequent Items Dynamically," Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 296-306, 2003.
[8] J. H. Chang and W. S. Lee, "A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams," Journal of Information Science and Engineering, Vol. 20, Issue 4, pp. 753-762, 2004.
[9] J. K. Chang and W. S. Lee, "Finding Recent Frequent Itemsets Adaptively over Online Data Streams," Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), Washington, DC, Aug. 24-27 2003.
[10] C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, "Mining Frequent Patterns in Data Streams at Multiple Time Granularities," Proceedings of the NSF Workshop on Next Generation Data Mining, 2002.
[11] L. Golab, and M. T. Özsu, "Issues in Data Stream Management," ACM SIGMOD Record, Vol. 32, Issue 2, pp. 5-14, 2003.
[12] M. M. Gaber, S. Krishnaswamy, and A. Zaslavsky," Adaptive Mining Techniques for Data Streams using Algorithm Output Granularity," The Australasian Data Mining Workshop (AusDM’03), Canberra, Australia, Dec. 2003.
[13] S. Guha, A. Meyerson, N. Mishra, R. Motwani, L. O’Callaghan, "Clustering Data Streams: Theory and Practice," IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 3, May/Jun. 2003.
[14] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining Data Stream: A Review," SIGMOD Record, Vol. 34, No. 2, Jun. 2005.
[15] C. Hidber, "Online Association Rule Mining," Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 145-156, Philadelphia, Pennsylvania, United States, 1999.
[16] J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Vol. 9, Issue 2, pp. 1-12, 1999.
[17] C. Jin, W. Qian, C. Sha, J. X. Yu, and A. Zhou, "Dynamically Maintaining Frequent Items over a Data Stream," Proceedings of the 12th ACM Conference on Information and Knowledge Management (CIKM’ 03).
[18] R. Jin and G. Agrawal, "Efficient Decision Tree Construction on Streaming Data," Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 571-576, Washington, DC, USA, Aug. 24-27 2003.
[19] M. Koyuturk, A. Grama, and N. Ramakrishnan, "Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-attribute Datasets," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, no. 5, pp. 447-461, 2005.
[20] M. Koyuturk, and A. Grama, "PROXIMUS: A Framework for Analyzing Very High Dimensional Discrete-Attributed Datasets," Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), pp. 147-156, 2003.
[21] M. Khan, Q. Ding, and W. Perrizo, "K-nearest Neighbor Classification of Spatial Data Streams Using P-trees," Proceedings of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’02), Taipei, Taiwan, May 6-8 2002.
[22] H. F. Li, S. Y. Lee, and M. K. Shan, "An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams," Proceedings of the First International Workshop on Knowledge Discovery in Data Streams, pp. 20-24, Pisa, Italy, Sep. 2004.
[23] M. Y. Lin, and S. Y. Lee, "Interactive Sequence Discovery by Incremental Mining," Journal of Information Sciences: An International Journal, Vol. 165, Issue 3-4, pp. 187-205, 2004.
[24] M. Y. Lin, and S. Y. Lee, "A Fast Lexicographic Algorithm for Association Rule Mining in Web Applications," Proceedings of the ICDCS Workshop on Knowledge Discovery and Data Mining in the World-Wide Web, pp. F7-F14, Taipei, Taiwan, R.O.C., 2000.
[25] C. H. Lee, C. R. Lin, and M. S. Chen, "Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining," Proceedings of the ACM Tenth International Conference on Information and Knowledge Management (CIKM’01), Nov. 2001.
[26] H. F. Li, and S. Y. Lee, "Single-Pass Algorithms for Mining Frequency Change Patterns with Limited Space in Evolving Append-only and Dynamic Transaction Data Streams," Proceedings of 2004 IEEE International Conference on EEE’04, pp. 215-222, 2004.
[27] C. H. Lee, C. R. Lin, and M. S. Chen, "Sliding-window Filtering : An Efficient Algorithm for Incremental Mining," Proceedings of the ACM 10th International Conference on Information and Knowledge Management(CIKM’01), pp. 263-270, Atlanta, Georgia, USA, Oct. 2001.
[28] G. S. Manku and R. Motwani, "Approximate Frequency Counts over Data Streams," Proceedings of the 28th VLDB Conference, pp. 346-357, Hong Kong, China, Aug. 2002.
[29] E. H. Mohammad and R. Z. Osmar, "Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining," Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, Aug. 24-27, 2003.
[30] K. Raymond and B. Hendrik, "Web Mining Research: A Survey," ACM SIGKDD Explorations, Vol. 2, Issue 1, pp. 1-15, Jun. 2000.
[31] M. K. Richard, S. Scott, and H. P. Christos, "A Simple Algorithm for Finding Frequent Elements in Streams and Bags," ACM Transactions on Database Systems (TODS), Vol. 28, Issue 1, pp. 51-55, Mar. 2003.
[32] J. Ruoming and G. Agrawal, "An Algorithm for In-Core Frequent Itemset Mining on Streaming Data," Proceedings of the 5th IEEE International Conference on Data Mining (ICDM’05), pp. 210-217, Houston, Texas, Nov. 2005.
[33] J. H. Su and W. Y. Lin, "CBW: An Efficient Algorithm for Frequent Itemset Mining," Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS''04), Aug. 5-8 2004.
[34] W. G. Teng, M. S. Chen, and P. S. Yu, "Resource-Aware Mining with Variable Granularities in Data Streams," Proceedings of the 4th SIAM International Conference on Data Mining, Apr. 22-24, 2004.
[35] X. Yu, Z. Chong, H. Lu, and A. Zhou, "False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams," Proceedings of the 30th International Conference on Very Large Data Bases (VLDB''04), pp. 204-215, Toronto, Canada, Aug. 31-Sep. 3, 2004.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊