(18.204.227.34) 您好!臺灣時間:2021/05/19 07:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:張耕源
研究生(外文):Keng-Yuan Chang
論文名稱:廣度優先之序列性規則資料探勘方法
論文名稱(外文):Efficient Sequential Pattern Mining by Breadth-First Approach
指導教授:李瑞庭李瑞庭引用關係
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊管理學研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:26
中文關鍵詞:序列性規則封閉集合資料探勘
外文關鍵詞:sequential patternclosed setdata mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:147
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
自從GSP演算法提出之後,許多相關的演算法被提出來且大多專注在找尋所有序列樣式。CloSpan演算法首先提出找尋封閉集合。封閉集合比全集合更精簡有效,且具有相同的表達能力。因此,CloSpan就以PrefixSpan演算法為基礎,加上兩個其稱為backward sub-pattern與backward super-pattern的刪減技巧,有效地找出封閉集合。

因此我們提出一個新的演算法以找尋封閉集合。然而不同於之前演算法多採深度優先的策略,我們的演算法是屬於廣度優先的方法。另外,之前提出的演算法鮮有明顯地利用項目的順序關係(item ordering)來強化找尋樣式的效率。我們利用定位資料串列(positional data list)來保存項目的順序關係。我們利用這些資料來幫助樣式(pattern)的產生,並依此提出了兩種刪減技巧分別為backward super-pattern condition與same positional data condition。為了確保儲存最後結果的柵格(lattice)的正確性與簡潔,我們另外還針對一些特殊情況做處理。由實驗的結果顯示,我們的演算法相較於CloSpan在中大型的資料庫與小的支持度(support)的狀況下都有較優良的表現。
Since the GSP algorithm is proposed to mine sequential patterns in sequence databases, many methods have been proposed and mostly focused on mining the complete set of frequent patterns. The CloSpan algorithm first suggested that the closed set of sequential patterns is more compact and has the same expressive power with respect to the full set. Based on the PrefixSpan algorithm, CloSpan added two pruning techniques, backward sub-pattern and backward super-pattern, to efficiently mine the closed set.

Therefore, in this thesis, we propose a new sequential pattern mining algorithm to mine closed sequences. However, instead of depth-first searching used in many previous methods, we adopt a breadth-first approach. Besides, previous methods seldom utilize the property of item ordering to enhance efficiency. We used a list of positional data to reserve the information of item ordering. By using these positional data, we developed two main pruning techniques, backward super-pattern condition and same positional data condition. To ensure correct and compact resulted lattice, we also manipulated some special conditions. From the experimental results, our algorithm outperforms CloSpan in the cases of moderately large datasets and low support threshold.
Table of Contents i
List of Figures ii
List of Tables iii
Chapter 1 Introduction 1
Chapter 2 Literature Survey 3
2.1 GSP 3
2.2 PrefixSpan 4
2.3 SPAM 5
2.4 CloSpan 6
2.5 Discussion 7
Chapter 3 Breadth-first Sequential Patterns Mining Approach 8
3.1 Preliminary concepts 8
3.2 First stage and item ordering representation 8
3.3 Second stage and longer sequence generation 9
3.4 Third stage and search space pruning 11
3.5 Special conditions handling 14
3.6 The mining algorithm 16
Chapter 4 Experiments and Performance Evaluation 19
4.1 Synthetic data and parameters 19
4.2 Experiments 20
Chapter 5 Conclusions and Future Work 24
References 25
[1]F. Masseglia, F. Cathala, and P. Poncelet, “The psp approach for mining sequential patterns”, In Proc. 1998 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD’98), Nantes, France, September 1998, pp. 176-184.
[2]H. Mannila, H. Toivonen, and A. I. Verkamo, “Discovery of frequent episodes in event sequences”, Data mining and Knowledge Discovery, vol. 1, no. 3, 1997, pp. 259-289.
[3]J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation”, In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD’02), Edmonton, Alberta, Canada, July 2002, pp. 429-435.
[4]J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth”, In Proc. Int. Conf. Data Engineering (ICDE ’01), Heidelberg, Germany, April 2001, pp. 215-224.
[5]J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, “Freespan: Frequent pattern-projected sequential pattern mining”, In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD’00), Boston, MA, August 2000, pp. 355-359.
[6]J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, In Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00), Dallas, TX, May 2000, pp.1-12.
[7]M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences”, Machine Learning, vol. 1, no. 1~2, 2001, pp. 31-60.
[8]M. Leleu, C. Rigotti, J.-F. Boulicaut, and G. Euvrard, “GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions”, In Proc. Int. Conf. Machine Learning and Data Mining (MLDM’03), Leipzig, Germany, July 2003, pp. 293-306.
[9]N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, "Discovering frequent closed itemsets for association rules", In Proc. 7th Int. Conf. Database Theory (ICDT''99), Jerusalem, Israel, January 1999, pp. 398-416.
[10]R. Agrawal and R. Srikant, “Fast algorithms for mining association rules”, In Proc. Int. Conf. Very Large Data Bases (VLDB’94), Santiago, Chile, September 1994, pp.487-499.
[11]R. Agrawal and R. Srikant, “Mining sequential patterns”, In Proc. Int. Conf. Data Engineering (ICDE’95), Taipei, Taiwan, March 1995, pp. 3-14.
[12]R. Agrawal and R. Srikant, “Mining sequential patterns: Generalizations and performance improvements”, In Proc. 5th Int. Conf. Extending Database Technology (EDBT’96), Avignon, France, March 1996, pp. 3-17.
[13]X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets”, In Proc. SIAM Int. Conf. on Data Mining (SDM''03), San Francisco, CA, May 2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top