跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.171) 您好!臺灣時間:2026/04/09 09:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林琬純
研究生(外文):Lin Wan-chuen
論文名稱:利用多維度限制式探勘關聯性規則
論文名稱(外文):Mining Association Rules with Multi-dimensional Constraints
指導教授:李瑞庭李瑞庭引用關係
指導教授(外文):Anthony J. T. Lee
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:英文
論文頁數:67
中文關鍵詞:關聯性規則多維度限制式
外文關鍵詞:Association ruleMulti-dimensional constraint
相關次數:
  • 被引用被引用:0
  • 點閱點閱:257
  • 評分評分:
  • 下載下載:16
  • 收藏至我的研究室書目清單書目收藏:2
在資料探勘的領域中,關聯性規則探勘是一個很重要的議題,而影響關聯性規則探勘績效的關鍵在於如何有效率地找出頻繁項目集(Frequent Itemset Discovery)。頻繁項目集的挖掘通常會產生大量的頻繁項目集和關聯性規則,但由於使用者有興趣的通常只是其中的一小部份集合而已,就必須找出所有的頻繁項目集之後再做事後的過濾篩選。這樣不但會降低資料探勘的效能,在效率上也會大大損失。以限制式為基礎的資料探勘(Constraint-based Mining)解決了這個問題,在資料探勘過程中加入使用者所指定的限制式,直接找出使用者所需要的部份,以提高資料探勘的效能和效率。
在過去的研究中,通常只用單一屬性值來描述交易中所記錄的項目(Item)。在現實生活裡,使用者通常會想要同時記錄多於一個的項目屬性值,並對多個屬性值下限制式。有鑑於此,在這篇論文裡,我們記錄項目使之具有多個屬性值,並定義了數個多維度限制式。而且,我們也針對多維度限制式的特性做出討論,提出演算法E-CFG和GE-CFG來利用多維度限制式探勘關聯性規則。由實驗結果得知,我們所提出的方法在所有的實驗中皆優於FP-growth+演算法。
Association rule mining is an important issue in the area of data mining. Frequent itemset discovery is the key factor in performance of association rule mining. Frequent itemset mining algorithms often generate a very large number of frequent itemsets and rules, which reduce not only the efficiency but also the effectiveness of the mining algorithms since only the subset of the complete frequent itemsets and association rules is of interest to users, and users need additional post-processing to filter through a large number of mined rules to find the useful ones. Constraint-based mining enables users to concentrate on mining itemsets that are interesting to themselves, which improves the efficiency of mining tasks.
Previously proposed methods consider that items in transactions are characterized only by single attribute value. In the real world, users may want to keep records of items with respect to more than one attribute and impose constraints on multiple dimensional attributes. In this thesis, we enhance the item representation by associating items with a number of attributes, so-called multi-dimensional items. We have defined and characterized some multi-dimensional constraints. Moreover, we have discussed the properties of those constraints and developed algorithms E-CFG and its generic form, GE-CFG, for mining frequent itemsets with multi-dimensional constraints. The experimental results show that both E-CFG and GE-CFG algorithms outperform the FP-growth+ algorithm for all the cases.
Table of Contents............................................i
List of Figures............................................iii
List of Tables..............................................iv
Chapter 1 Introduction......................................1
Chapter 2 Review of Literature..............................4
2.1 Frequent Itemset Discovery............................4
2.1.1 Apriori Algorithm.................................5
2.1.2 FP-growth Algorithm...............................6
2.2 Constraint-based Frequent Itemset Discovery..........10
2.2.1 Constraint Category Definition...................11
2.2.2 Constrained Apriori (CAP)........................12
2.2.3 Constrained FP-growth (CFG)......................14
2.3 Discussion...........................................18
Chapter 3 Research Methodology.............................20
3.1 Data Structure of Multi-dimensional Items............20
3.2 Basic Definition.....................................20
3.3 Multi-dimensional Constraints........................21
3.4 Problem Definition...................................26
3.5 Algorithm............................................27
3.5.1 Overview.........................................27
3.5.1.1 Phase 1: Frequency Check.....................30
3.5.1.2 Phase 2: Constraint Check....................30
3.5.1.3 Phase 3: Conditional FP-tree Construction....30
3.5.2 Algorithms for Case 1............................31
3.5.2.1 Anti-monotone constraint Cam.................31
3.5.2.2 Monotone constraint Cm.......................37
3.5.3 Algorithms for Case 2............................41
3.5.3.1 Anti-monotone constraint Cam.................41
3.5.3.2 Monotone constraint Cm.......................47
3.5.4 Discussion.......................................51
Chapter 4 Experiments and Performance Evaluation...........52
4.1 Synthetic Data Generation and Experiment Parameters..52
4.2 Experiments of E-CFGA and GE-CFGA....................54
4.2.1 Effect of the Constraint Selectivity.............54
4.2.2 Effect of the Number of Transactions.............57
4.2.3 Effect of the Support Threshold..................58
4.3 Experiments of E-CFGM and GE-CFGM....................59
4.3.1 Effect of the Constraint Selectivity.............60
4.3.2 Effect of the Support Threshold..................61
Chapter 5 Conclusions and Future Work......................64
References..................................................66
[1]G. Grahne, L.V.S. Lakshmanan, and X. Wang, Efficient Mining of Constrained Correlated Sets, In Proc. Int. Conf. Data Engineering (ICDE’00), pages 512-521, Feb. 2000.
[2]J-F. Boulicaut and B. Jeudy, Using Constraint for Set Mining: Should We Prune or not? In Proceedings Bases de Données Avançées (BDA’00), pages 221-237, Oct. 2000.
[3]J. Han, G. Dong, and Y. Yin, Efficient Mining of Partial Periodic Patterns in Time Series Database, In Proc. IEEE Int. Conf. Data Engineering (ICDE’99), pages 106-115, March 1999.
[4]J. Han, J. Pei and Y. Yin, Mining Frequent Patterns without Candidate Generation, In Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00), pages 1-12, May 2000.
[5]J. Pei and J. Han, Can We Push More Constraints into Frequent Pattern Mining? In Proc. Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pages 350-354, Aug. 2000.
[6]J. Pei, J. Han, and L.V.S. Lakshmanan, Mining Frequent Itemsets with Convertible Constraints, In Proc. IEEE Int. Conf. Data Engineering (ICDE’01), pages 433-442, Feb. 2001.
[7]R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, In Proc. Int. Conf. Very Large Data Bases (VLDB’94), pages 487-499, Sept. 1994.
[8]R. Agrawal and R. Srikant, Mining Sequential Patterns, In Proc. IEEE Int. Conf. Data Engineering (ICDE’95), pages 3-14, March 1995.
[9]R. Agrawal, T. Imielinski. A. Swami, Mining Association Rules between Sets of Items in Large Databases, In Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’93), pages 207-216, May 1993.
[10]R. J. Bayardo, Efficiently Mining Long Patterns from Databases, In Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pages 85-93, June 1998.
[11]R. Srikant, Q. Vu, R. Agrawal, Mining Association Rules with Item Constraints, In Proc. Int. Conf. Knowledge Discovery and Data (KDD’97), pages 67-73, 1997.
[12]Raymond T. Ng, L.V.S. Lakshmanan. J. Han, and A. Pang, Exploratory Mining and Pruning Optimizations of Constrained Association Rules, In Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pages 13-24, June 1998.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top