跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2024/12/02 23:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:鄭憲徽
研究生(外文):Xian-hui Cheng
論文名稱:利用由上往下的交集方法與交易頻次表來改善PincerSearch演算法
論文名稱(外文):A Top-Down Intersection Method with Transaction Frequency Table to Improve Pincer-Search Algorithm
指導教授:楊東麟楊東麟引用關係
指導教授(外文):Don-lin Yang
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:47
中文關鍵詞:最大高頻項目組最大封閉式高頻項目組資料挖掘Pincer-search關聯式法則
外文關鍵詞:Data miningmaximal closed itemsetintersectionmaximal frequent itemsetassociation rulesPincer-search
相關次數:
  • 被引用被引用:0
  • 點閱點閱:322
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:0
在資料挖掘的領域中,關聯式法則的應用變得越來越普遍了。越來越多的人投入這方面的研究,並有了顯著的進展。但是一般的演算法需要花費多次的資料庫掃描時間,或是在過程中產生大量的候選項目組。尋找最大高頻項目組(MFI: Maximal Frequent Itemset)的方法在近年成為很重要的研究議題。傳統找尋MFI的方法以Pincer search為代表,可以快速刪除候選項目組。但是其產生的候選項目組很多,會造成資料庫掃描的次數增加。CMFI [6] 就是針對這樣的缺點,結合了Closed set 的概念而產生。但是這樣的方法只針對Pincer search的Bottom-up做改良,節省的資料庫掃描次數有限。因此我們提出了BCTI (Bottom-up Closed and Top-down Intersection Mining)演算法來解決這些問題,主要是結合Pincer-Search、Closed set與Intersection的概念。
BCTI演算法應用Pincer-Search的上下搜尋的概念,set在由下往上的搜尋時,利用的Closed set的概念,來加快搜尋的速度。而在由上往下的搜尋時,則是利用Intersection的方法來找出可能的候選項目組。為了讓Intersection有更好的效率,我們還在第一次的資料庫掃描時先將資料庫整理為交易頻次表(Transaction Frequency Table)。交易頻次表中紀錄了每一筆交易在資料庫中出現的次數,並依其長度及字母的順序排序。我們的方法不但能快速搜尋到可能的最大高頻項目組,減少不必要的資料庫掃描時間,而且所花費的記憶體空間比使用樹狀結構(Tree structure)的演算法節省更多。我們的演算法在資料庫中長度為1的高頻項目組的個數與單筆交易長度的差距越大時,效能越好。
The application of association rules becomes more and more popular in the field of data mining. There are more people becoming interested in this research and getting fruitful results. In general, most association rule mining algorithms spend a lot of time scanning databases, or generate a lot of candidates. In addition, finding Maximal Frequent Itemset (MFI) becomes an important issue. A representative of traditional algorithm to find maximal frequent itemsets is the Pincer-search. This algorithm can prune the infrequent candidates quickly, but it still has too many candidates that require a large number of database scans. The CMFI [6] is an algorithm without this drawback, and it employs the concept of closed set. However, this algorithm only improves the bottom-up process of the Pincer-Search, and saves a limited number of database scanning.
In this research we propose a new approach, the BCTI (Bottom-up Closed and Top-down Intersection) algorithm to solve these problems. Our method combines the concepts of Pincer-Search, closed set and Intersection. First, the BCTI algorithm applies the concepts of top-down and bottom-up search of the Pincer-Search. In the bottom-up process, our algorithm accelerates the speed of searching. In the top-down process, we use the concept of intersection to generate possible candidates. In order to improve the efficiency of intersection operation, we re-arrange the transaction table at the first time of database scanning. A transaction count table is used to record the number of every transaction appearing in the database, and is ordered by length and lexicographic sequence. Our method can not only find possible maximal frequent itemsets faster and reduce the unnecessary database scanning time, but also spend less memory space than the algorithms using the tree-structure. When the difference of the number of 1-itemsets and the length of transaction is greater, our algorithm has much better performance.
誌謝 i
中文摘要 ii
Abstract iii
Table of Contents v
List of Figures vii
List of Tables viii
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 2
Chapter 2 Related Work 3
2.1 Data Mining 3
2.2 Basic Association Rule Mining 5
2.2.1 Apriori Algorithm 6
2.2.2 Partition Algorithm 7
2.3 Pincer-Search Algorithm 8
2.4 Close Algorithm 9
Chapter 3 Our Proposed Method 12
3.1 Overview 12
3.2 BCTI Algorithm 13
3.2.1 Closed Itemset 13
3.2.2 Top-down Intersection Mining 15
3.2.3 The Concept of BCTI 17
3.2.4 The Process of BCTI 21
3.3 Example of BCTI 23
Chapter 4 Experimental Results 28
4.1 Environment and Data 28
4.2 Experimental Result and Discussion 29
Chapter 5 Conclusions and Future Work 34
5.1 Conclusions 34
5.2 Future work 34
References 36
[1]R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499, 1994.

[2]K. J. Anil and C. D. Richard, Algorithms for clustering data. Prentice-Hall, Inc., 1988.

[3]M. Berry and G. Linoff, Data Mining Techniques for Marketing, Sales and Customer Support. Hoboken, NJ: Wiley, 1997.

[4]D. Burdick, M. Calimlim, J. Flannick, J. Gehrke and T. M. Yiu, "MAFIA: A maximal frequent itemset algorithm," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, pp. 1490-1504, 2005.

[5]M. S. Chen, J. Han and P. S. Yu, "Data mining: an overview from a database perspective," IEEE Transactions on Knowledge and Data Engineering Vol. 8, No. 6, pp. 866-883, 1996.

[6]C. J. Chu, An Efficient Closure-Based Method for Discovering Maximal Frequent Itemsets. Master thesis, Feng Chia University, 2003.

[7]K. Gouda and M. J. Zaki, "GenMax: An efficient algorithm for mining maximal frequent itemsets," Data Mining and Knowledge Discovery, Vol. 11, No. 3, pp. 223-242, 2005.

[8]G. Grahne and J. F. Zhu, "Fast algorithms for frequent itemset mining using FP-trees," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, pp. 1347-1362, 2005.

[9]J. Hipp and G. Lindner, "Analyzing Warranty Claims of Automobiles:An Application Description Following the CRISP-DM Data Mining Process," Proceedings of the 5th International Computer Science Conference, pp. 31-40, Dec, 1999, Hong Kong.

[10]E. Hüllermeier, "Possibilistic Induction in Decision-Tree Learning," Proceedings of the 13th European Conference on Machine Learning, pp. 173-184, 2002.

[11]D. I. Lin and Z. M. Kedem, "Pincer-search: An efficient algorithm for discovering the maximum frequent set," IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 553-566, 2002.

[12]N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, "Discovering frequent closed itemsets for association rules," Proceedings of the Seventh International Conference on Database Theory (ICDT'' 99), pp. 398-416, 1999.

[13]N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal, "Efficient mining of association rules using closed itemset lattices," Information Systems, Vol. 24, No. 1, pp. 25-46, 1999.

[14]J. R. Quinlan, C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., 1993.

[15]J. Roberto and J. Bayardo, "Efficiently mining long patterns from databases," Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 85-93, 1998, Seattle, Washington, United States.

[16]K. Srikymar and B. Bhasker, "Efficiently mining maximal frequent sets for discovering association rules," Proceedings of the Seventh International Database Engineering and Applications Symposium, pp. 104-110, 2003.

[17]H. Wang, Q. H. Li, C. X. Ma and K. L. Li, "A maximal frequent itemset algorithm," Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, pp. 484-490, 2003.

[18]J. Y. Wang, J. W. Han, Y. Lu and P. Tzvetkov, "TFP: An efficient algorithm for mining top-K frequent closed itemsets," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 5, pp. 652-664, 2005.

[19]R. Wille, "Concept lattices and conceptual knowledge systems," Computers Math. Application, Vol. 23, No. 6-9, pp. 493-515, 1992.

[20]M. J. Zaki, "Generating non-redundant association rules," Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 34-43, 2000, Boston, MA.

[21]M. J. Zaki and C. J. Hsiao, "Efficient algorithms for mining closed itemsets and their lattice structure," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, pp. 462-478, 2005.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top