跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.170) 您好!臺灣時間:2024/12/08 15:20
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張家汶
研究生(外文):Chia-Wen Chang
論文名稱:EfficientAlgorithmsforMiningSequentialPatternswithTimeConstraints
論文名稱(外文):具時間限制之高效率序列樣式探勘演算法
指導教授:林明言
指導教授(外文):Ming-Yen Lin
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:94
語文別:英文
論文頁數:76
中文關鍵詞:序列樣式時間限制封閉序列樣式探勘循序樣式記憶體索引
外文關鍵詞:time constraintsequential patternssequence miningmemory indexclosed sequential pattern
相關次數:
  • 被引用被引用:1
  • 點閱點閱:352
  • 評分評分:
  • 下載下載:24
  • 收藏至我的研究室書目清單書目收藏:2
探勘序列樣式是資料探勘領域中的一個重要研究議題。序列樣式探勘可以從一個序列資料庫中找出所有頻繁的子序列樣式。在探勘的過程中,除了使用者指定的最小頻繁門檻值外,若能提供其他條件則可找出更準確的結果。大部分的條件限制能夠從一般方法找出來的序列樣式加以過濾而取出滿足條件的樣式。由於與時間相關的條件限制影響了序列樣式支持度的計算;亦即一個序列樣式必須在資料序列中滿足時間的條件,其支持度的計數才會增加。所以指定時間限制的樣式探勘必須設計有考慮時間的新演算法。本論文提出一個在記憶體中建立時間索引的演算法,稱METISP,來找出有時間限制的序列樣式。演算法考慮的時間條件可以指定最小、最大或相等的時間間隔、可滑動的時間區間與總持續的時間範圍之時間限制。METISP 將資料庫載入記憶體並建立時間的索引以進行有效率的樣式探勘。利用索引集合與樣式增長 (pattern-growth) 的策略,METISP 不需要產生任何的候選樣式或是子資料庫即可找出符合要求的樣式。這些時間的索引減少搜尋的範圍並加速了項目支持度的計算。此外,本論文也提出一個找尋具時間限制封閉樣式之新演算法,稱CTSP。CTSP藉由一個整合時間限制的雙向封閉檢查的方法來探勘指定最大、最小間隔及可滑動的時間區間的封閉序列樣式。探勘出來的封閉的樣式在呈現給使用者更精簡的結果下仍保有完整的資訊。為評估效能,METISP與CTSP在許多不同的資料庫與時間限制下與知名的GSP與DELISP演算法做比較。眾多實驗的結果顯示了即使當使用者指定了較低的門檻值或是探勘大型的的資料庫,METISP與CTSP都能有較好的執行效率。
Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining. Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by retrieving qualified patterns from the discovered ones. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, a memory time-indexing approach, called METISP, is proposed in this thesis to discover sequential patterns with time constraints including minimum gap, maximum gap, exact gaps, sliding window, and duration. METISP scans the database into memory and constructs time-index sets for effective processing. Utilizing the index sets and the pattern-growth strategy, METISP efficiently mines the desired patterns without generating any candidate or sub-database. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting to the indicated ranges of potential items. In addition, a novel algorithm, called CTSP, is also proposed for mining closed sequential patterns with time constraints. The closed patterns preserve the complete information with more compact representations. We have evaluated METISP algorithm and CTSP algorithm with the well-known GSP algorithm and the DELISP algorithm for various datasets and constraints. The comprehensive experiments show that METISP and CTSP both have better efficiency, even with low support thresholds and very large databases.
致謝...................................................i
中文摘要...............................................ii
Abstract...............................................iii
Table of Contents......................................iv
List of Figures........................................vi
List of Tables.........................................viii
Chapter 1 Introduction.................................1
1.1 Background.........................................1
1.2 Research Motivation................................4
1.3 Research Objectives................................6
1.4 Organization of this Thesis........................7
Chapter 2 Related Work.................................8
2.1 GSP Algorithm......................................9
2.2 SPADE and cSPADE Algorithms........................10
2.3 PrefixSpan and prefix-growth Algorithms............13
2.4 DELISP Algorithm...................................15
2.5 CloSpan Algorithm..................................16
2.6 BIDE Algorithm .....................................16
2.7 Par-CSP Algorithm..................................17
Chapter 3 METISP: MEmory Time-Indexing Sequential Pattern mining...18
3.1 Problem Statement..................................18
3.2 Terminology Used in the Proposed Algorithm.........20
3.3 METISP Algorithm: A Detailed Description...........23
3.4 An Illustrating Example of Mining Patterns Using METISP...26
3.5 Dealing with Extra-large Databases.................28
3.6 Difference between METISP Algorithm and prefix-growth Algorithm ....30
3.7 Experimental Evaluations...........................31
3.8 Performance Results................................32
3.9 Summary............................................40
Chapter 4 CTSP: Closed Time-constrained Sequential Pattern mining...41
4.1 Problem Statement..................................41
4.2 Terminology Used in CTSP...........................44
4.3 CTSP Algorithm: A Detailed Description.............47
4.4 An Illustrating Example of Mining Patterns Using CTSP.....50
4.5 Using CTSP for Mining Large Databases..............53
4.6 Experimental Evaluations...........................53
4.7 Performance Results................................54
4.8 Summary............................................59
Chapter 5 Conclusions and Future Work..................60
5.1 Contributions......................................60
5.2 Future Work........................................61
References.............................................63
[1] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499, 1994.
[2] R. Agrawal, and R. Srikant, “Mining Sequential Patterns,” Proceedings of the 11th International Conference on Data Engineering, pp. 3-14, Taipei, Taiwan,1995.
[3] R. Agrawal, and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proceedings of the 5th International Conference on Extending Database Technology, pp. 3-17, Avignon, France, 1996.
[4] C. Antunes, and A.L. Oliveira, “Sequential Patterns Mining Algorithms: Trade-off between Speed and Memory,” Proceedings Of Second Intfl Workshop on Mining Graphs, Trees and Sequences (MGTS 2004), Pisa, Italy, 2004.
[5] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential PAttern Mining using A Bitmap Representation,” Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining, pp. 429-435, 2002.
[6] J. H. Chang, and W. S. Lee, “Efficient mining method for retrieving sequential patterns over online data streams,” Journal of Information Science, Volume 31, Issue 5, pp. 420-432, Oct. 2005.
[7] Y. L. Chen, M. C. Chiang, and M. T. Kao, “Discovering time-interval sequential patterns in sequence databases,” Expert Systems with Applications, Volume 25, Issue 3, pp. 343-354, Oct. 2003.
[8] Y. L. Chen, T. C. K. Huang, “Discovering fuzzy time-interval sequential patterns in sequence databases,” IEEE Transactions on Systems, Man, and Cybernetics, Part B, Volume 35, Number 5, pp. 959-972, Feb. 2005.
[9] G. Chen, X. Wu, and X. Zhu, “Sequential Pattern Mining in Multiple Streams,” Proceedings of the 5th IEEE International Conference on Data Mining, pp. 585-588, Nov. 2005.
[10] D. Y. Chiu, Y. H. Wu, and A. L. P. Chen, “An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting,” Proceedings of the 20th International Conference on Data Engineering, pp. 375-386, 2004.
[11] S. Cong, J. Han, and D. A. Padua, “Parallel mining of closed sequential patterns,” Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 562-567, Chicago, Illinois, USA, Aug. 2005.
[12] F. M. Facca, and P. L. Lanzi, “Mining interesting knowledge from weblogs: a survey,” Data & Knowledge Engineering, Volume 53, Number 3, pp. 225-241, Jun. 2005.
[13] M. N. Garofalakis, R. Rastogi, and K. Shim, “SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,” Proceedings of the 25th International Conference on Very Large Data Bases, pp. 223-234, Edinburgh, Scotland, Sep. 1999.
[14] J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Volume 8, Issue 1, pp. 53-87, 2004.
[15] Ben. Kao, M. Zhang, C. L. Yip, D. W. Cheung, and U. M. Fayyad, “Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences,” Data Mining and Knowledge Discovery, Volume 10, Number 2, pp. 87-116, Mar. 2005.
[16] R. Kohavi, C. Brodley, B. Frasca, L. Mason, and Z. Zheng, “KDD-Cup 2000 organizers'' report: Peeling the onion,” SIGKDD Explorations, pp. 86-98, 2000.
[17] H.-C. Kum, J. Pei, W. Wang, and D. Duncan, “ApproxMAP: Approximate Mining of Consensus Sequential Patterns,” Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, May 2003.
[18] M. Y. Lin, and S. Y. Lee, “Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning,” Journal of Information Science and Engineering, Volume. 21, Number 1, pp. 109-128, Jan. 2005.
[19] M. Y. Lin, and S. Y. Lee, “Efficient Mining of Sequential Patterns with Time Constraints by Delimited Pattern-Growth,” Knowledge and Information Systems, Volume 7, Issue 4, pp. 499-514, May 2005.
[20] C. Luo, and S. M. Chung, “A Scalable Algorithm for Mining Maximal Frequent Sequences Using Sampling,” Proceedings of 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 156-165, Nov. 2004.
[21] F. Masseglia, F. Cathala, and P. Poncelet, “The PSP Approach for Mining Sequential Patterns,” Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, pp. 176-184, Nantes, France, 1998.
[22] F. Masseglia, P. Poncelet, and M. Teisseire, “Pre-Processing Time Constraints for Efficiently Mining Generalized Sequential Patterns,” Proceedings of the 11th International Symposium on Temporal Representation and Reasoning, pp. 87-95, France, 2004.
[23] A. Marascu, and F. Masseglia, “Mining Sequential Patterns from Temporal Streaming Data,” Proceedings of the first ECML/PKDD workshop on Mining Spatio-Temporal Data, Porto, Portugal, Oct. 2005.
[24] S. Orlando, R. Perego, and C. Silvestri, “A new algorithm for gap constrained sequence mining,” Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 540-547, Nicosia, Cyprus, 2004.
[25] J. Pei, J. Han, B. Moryazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proceedings of the 17th International Conference on Data Engineering, pp. 215-224, Heidelberg, Germany, Apr. 2001.
[26] J. Pei, J. Han, and W. Wang, “Mining sequential patterns with constraints in large databases,” Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 18-25, 2002.
[27] T. Petre, Xifeng. Yan, and J. Han, “TSP: Mining top-k closed sequential patterns,” Knowledge and Information Systems, Volume 7, Issue 4, pp. 438-457, May 2005.
[28] M. Seno, and G. Karypis, “SLPMiner: An Algorithm for Finding Frequent Sequential patterns Using Length-Decreasing Support Constraint,” Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 418-425, 2002.
[29] W. -G.. Teng, M.-S. Chen, and P. S. Yu, “A Regression-based Temporal Pattern Mining Scheme for Data Streams,” Proceedings of the 29th International Conference on Very Large Data Bases, pp. 93-104, Berlin, Germany, Sep. 2003.
[30] J. Wang, and J. Han, “BIDE: Efficient Mining of Frequent Closed Sequences,” Proceedings of the 20th International Conference on Data Engineering, pp. 79-90, Boston, Massachusetts, Mar. 2004.
[31] X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Databases,” Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, USA, May 2003.
[32] C. C. Yu, and Y. L. Chen, “Mining sequential patterns from multi-dimensional sequence data,” IEEE Trans on Knowledge and Data Engineering, Volume 17, Issue 1, pp. 136-140, Jan. 2005.
[33] M. J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning Journal, Volume 42, Issue 1-2, pp. 31-60, Jan. 2001.
[34] M. J. Zaki, “Sequence Mining in Categorical Domains: Incorporating Constraints,” Proceedings of the 9th International Conference on Information and Knowledge Management, pp. 422-429, Nov. 2000.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top