跳到主要內容

臺灣博碩士論文加值系統

(44.192.20.240) 您好!臺灣時間:2024/03/04 12:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林書帆
研究生(外文):Shu-Fan Lin
論文名稱:一個基於排容原理與交易映射的頻繁項目集探勘演算法
論文名稱(外文):A Frequent Itemset Mining Algorithm based on Principle of Inclusion-Exclusion and Transaction Mapping
指導教授:廖宜恩廖宜恩引用關係
指導教授(外文):I-En Liao
口試委員:高勝助李漢銘
口試日期:2011-06-20
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊網路多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:52
中文關鍵詞:關聯規則探勘頻繁項目集探勘演算法排容原理交易映射區間串列
外文關鍵詞:Association Rule MiningFrequent Itemset MiningPrinciple of Inclusion-ExclusionTransaction MappingInterval List
相關次數:
  • 被引用被引用:0
  • 點閱點閱:239
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
關聯規則探勘(Association Rule Mining)是目前最重要的資料挖掘問題之一,它的目的是要從銷售的交易資料庫中發現商品項目間的關聯性,進而提供決策者作為參考的依據。而在探勘關聯規則的問題中最重要且複雜的一個步驟是從資料庫中找出所有的頻繁項目集。
本論文提出了一種嶄新的探勘頻繁項目集演算法,我們稱為PIETM(Principle of Inclusion-Exclusion and Transaction Mapping)演算法。其特色為有效地結合了知名演算法Apriori與FP-growth的優點,並導入排容原理與交易映射來計算項目集支持度。其中我們也利用FP-Growth所提出的樹狀結構之特性有效地輔助排容原理的計算。此外,我們更提出了重組排容原理公式提升計算效能的方法,可以提升排容原理的計算速度約四倍。實驗結果顯示,PIETM能在屬性複雜的資料庫中有效地探勘,當最小支持度極小時,PIETM仍能有效地從資料庫中探勘出所有的頻繁項目集,其原因在於PIETM演算法在探勘時不需重覆建立與追蹤條件樹,另外也不需重複掃描資料庫便可以得到候選項目集之支持度。


Association rules mining is one of the most important data mining techniques. It is useful for discovering relationships among different items to provide information for policy-maker. The most important and complex step in mining association rules is to discover all frequent itemsets from transaction database.
In this paper, we present a novel algorithm for mining frequent itemsets, and this algorithm is referred as the PIETM(Principle of Inclusion-Exclusion and Transaction Mapping) algorithm. The characteristics of this algorithm are that it combines the advantages of two famous algorithms – Apriori and FP-Growth, and that it caculates the support of itemset using by Principle of Inclusion-Exclusion and Transaction Mapping. In this algorithm, we also employ the characteristics of tree structure in FP-Tree to effectively help Principle of Inclusion-Exclusion Theory for caculating support of itemset. Besides, we reformulate the Principle of Inclusion-Inclusion and present a method that can speed up the caculation of Principle of Inclusion-Exclusion 4 times compared to the original formulation. The experimental results show that the PIETM algorithm is effcient even in the database that has very large number of differnt items. This is because we are able to mine frequent itemset without scaning database repeatly, nor traversing any tree structure recursively.


目錄
第一章 緒論 - 1 -
1.1 研究背景與動機 - 1 -
1.2 問題定義 - 5 -
1.3 主要貢獻 - 5 -
1.4 論文架構 - 6 -
第二章 相關研究 - 7 -
2.1 由下而上演算法 - 8 -
2.1.1 Apriori演算法 - 9 -
2.1.2 DHP演算法 - 10 -
2.1.3 Partition演算法 - 12 -
2.1.4 DIC演算法 - 13 -
2.2 由上而下演算法 - 14 -
2.2.1 TDM演算法 - 14 -
2.3其他類型演算法 - 15 -
2.3.1 TM演算法 - 15 -
2.3.2 Pincer-Search演算法 - 17 -
2.3.3 FP-Growth演算法 - 21 -
第三章 基本定理 - 24 -
3.1 排容原理 - 24 -
3.2 交易樹 - 26 -
3.3 交易映射和區間串列的建立 - 28 -
第四章 基於排容原理與交易映射的頻繁項目集探勘演算法 - 32 -
4.1 區間串列的聯集 - 32 -
4.2 PIETM演算法 - 34 -
4.3 提高使用排容原理公式的效率 - 36 -
第五章 系統實作與實驗 - 39 -
5.1 系統實作環境 - 39 -
5.2 實驗設計與目的 - 40 -
5.3 實驗結果分析與討論 - 41 -
第六章 結論與未來研究方向 - 47 -
6.1 結論 - 47 -
6.2 未來研究方向 - 48 -
參考文獻 - 50 -


[1]R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. of the ACM SIGMOD Conf. on Management of Data, 1993, pp. 207-216.
[2]R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Databases,” Proc. of 20th VLDB Conf., 1994, pp. 487-499.
[3]R. Agrawal and R. Srikant, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. of the 5th International Conference on Extending Database Technology, pp. 3-14, 1996.
[4]S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM-SIGMOD Conference Management of Data, pp. 255-264, 2005.
[5]D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: A Maximal Frequent Itemset Algorithm,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 11, pp. 1490-1504, 2005.
[6]A. Ghoting, G. Buehrer, S. Parthasarathy, D. Kim, A. Nguyen, Y. Chen, and P. Dubey, “Cache-Conscious Frequent Pattern Mining on Modern and Emerging Processors,” The International Journal on Very Large Data Bases, Vol. 16, Issue 1, pp. 77-96, 2007.
[7]Gosta Grahne and Jianfei Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 10, pp. 1347-1362, 2005.
[8]Jiawei Han, Jian Pei, and Yiwen Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. of the ACM-SIGMOD Conf. Management of Data, 2000, pp. 1-12.
[9]Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, Vol. 8, Issue 1, 2004, pp. 53-87.
[10]Kuen-Fang Jea, Ke-Chung Lin, and I-En Liao, “Mining Hybrid Sequential Patterns by Hierarchical Mining Technique,” International Journal of Innovative Computing, Information and Control (IJICIC), Vol. 5, No.8, pp.1349-4198 2009
[11]D. Lin and Z. M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 3, pp. 553-566, 2002.
[12]Ke-Chung Lin, I-En Liao and Zhi-Sheng Chen, “An improved frequent pattern growth method for mining association rules,” Expert Systems with Applications, Volume 38, Issue 5, May 2011, pp. 5154-5161.
[13]G. D. Mulligan and D. G. Corneil, “Corrections to Bierstone’s Algorithm for Generating Cliques,” In J. Association of Computing Machinery, Vol. 19, No. 2, pp. 244-247, 1972.
[14]S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and Resource-Aware Mining of Frequent Sets,” In Proc. The 2002 IEEE International Conference on Data Mining(ICDM’ 02), pp.338-345, 2002.
[15]S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, “kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.
[16]J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 5, 1997, pp. 813-825.
[17]B. Racz, “nonordfp: An FP-growth variation without rebuilding the FP-tree,” Proc. of IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2004.
[18]Ashok Savasere, Edward Omiecinski, Shamkant Navathe, “An Efficient Algorithm for Mining Association Rules in Large Database,” Proc. of the 21st VLDB Conference, Zurich, Swizerland, 1995.
[19]Mingjun Song and Sanguthevar Rajasekaran, “A Transaction Mapping Algorithm for Frequent Itemsets Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 4, 2006, pp. 472-481.
[20]Mingjun Song and Sanguthevar Rajasekaran, “Finding Frequent Itemsets by Transaction Mapping,” Proc. of the 2005 ACM symposium on Applied computing, 2005, pp. 498-492.
[21]M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. of 3rd Knowledge Discovery & Data Mining Conference, pp. 283-286, 1997.
[22]Shiwei Zhu, Renqian Zhang and Guoping Xia, “APT-Structure: Efficient Mining of Frequent Patterns,” Proc. of 2010 International Conference on E-Business and E-Government, 2010, pp. 1395-1398.
[23]IBM Synthetic Data Generator,
http://www.cs.nmsu.edu/~cgiannel/assoc_gen.html
[24]Frequent Itemset Mining Implementations Repository,
http://fimi.ua.ac.be/data/


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top