(3.235.11.178) 您好!臺灣時間:2021/03/07 08:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:邱士軍
研究生(外文):Shih-Chun Chiu
論文名稱:關聯規則演算法之實作和效能評估
論文名稱(外文):Implementataion and Performance Evaluation of Association Rule Mining Algorithms
指導教授:呂永和
指導教授(外文):Yungho Leu
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:62
中文關鍵詞:資料探勘關聯規則效能評估效能分析實作
外文關鍵詞:Data MiningAssociation RulePerformance EvaluationPerformance AnalysisImplememtation
相關次數:
  • 被引用被引用:4
  • 點閱點閱:651
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
關聯規則的探勘一直是資料探勘領域相當熱門的一環。近幾年來,新的演算法不斷地被提出,但這些演算法的效能卻很少被第三團體來分析評估。實際應用中,一份正確的評估報告可以成為選擇演算法的依據,因此有必要以更客觀的方式比較這些演算法。本研究中我們分析五個知名的演算法,Apriori、Boolean、FP-Growth、Maxminer及DIC。我們對每個演算法都採多次實作,並從作者及我們的的實作版本中選擇最好的來比較。同時也提出這些演算法的部份實作細節供參考。測試資料庫方面,我們採用三個資料分佈情形不同的人造資料庫及一個真實資料庫。它們分別是T5I2、T10I4、T20I6及Foodmart。透過實驗結果,我們發現各個演算法的效能高低並非絕對。Apriori、DIC演算法在處理資料量小的時侯領先; Boolean和FP-Growth演算法在面對大量資料時效能卓越。另外Boolean和FP-Growth演算法在交易長度較長的資料庫也有優異的表現。我們也發現FP-tree的所須要的記憶體並不會比資料庫本身來的小。
Mining of association rules is a popular research area in data mining. Many mining algorithms for association rules have been proposed in the recent years. Every author of the mining algorithm claims that his algorithm is the best under some specific conditions. A fair comparison serves as a guide for choosing the right mining algorithm for a given specific condition. Unfortunately, no fair third party has conducted comprehensive comparisons among the association rule mining algorithms. In this thesis, we perform performance comparisons on five well-known algorithms. Among them are Apriori, Boolean, FP-Growth, Maxminer and DIC algorithms. We implemented several versions for each algorithm. Then, we choose the most efficient implementation among our implementations and the implementation provided directly by the original author if one is available. We also describe the details of our implementations. To compare the performance of the algorithms, we use three synthetic transactional databases generated by the IBM dataset generator and the FoodMart database, a real transactional database from SQL Server. The three synthetic databases are T5I2, T10I4 and T20I6. They have different mean transaction length and mean frequent itemsets length.
Experiments show that no algorithm prevails in all circumstances. The Apriori algorithm and the DIC algorithm prevail when the minimum support is high and, therefore, less computation time is needed. On the other hand, the Boolean algorithm and the FP-Growth algorithm scale up well in the sense that they prevail under low minimum support. Furthermore, the Boolean algorithm and the FP-Growth algorithm significantly outperform other algorithms when the mean transaction length is long. Besides, we also found that the memory size occupied by the FP-tree is at least as large as the transactional database itself.
目錄
中文摘要 I
英文摘要 II
圖表索引 V
誌 謝 X
第一章、緒論 1
1.1 資料探勘簡介 1
1.2 研究目的與動機 2
1.3 研究貢獻 3
1.4 論文架構 4
第二章、關聯規則演算法 5
2.1 關聯規則 5
2.2 關聯規則演算法 6
2.3 關聯規則演算法分類 7
第三章、測試演算法及實作心得 9
3.1 Apriori演算法及實作 9
3.1.1 Apriori演算法 9
3.1.2 Apriori演算法的實作心得 11
3.2 Boolean演算法及實作 18
3.2.1 Boolean演算法 18
3.2.2 Boolean演算法的實作心得 21
3.3 FP-Growth演算法及實作 25
3.3.1 FP-Growth演算法 25
3.3.2 FP-Growth演算法的實作心得 27
3.4 Maxminer演算法及實作 31
3.5 DIC演算法及實作 33
3.5.1 DIC演算法 33
3.5.2 DIC演算法的實作心得 36
第四章、測試資料庫及測試項目 37
4.1 測試資料庫 37
4.1.1 測試資料庫屬性 38
4.1.2 測試資料庫交易長度分佈情形 38
4.1.3 不同最小支持度對應之高頻項目組數量 39
4.2 測試項目及目的 41
第五章、實驗結果與分析 44
5.1 測試環境介紹 44
5.2 參與比較的程式 45
5.3 與時間有關的實驗結果與分析 46
5.4 其它實驗結果與分析 56
第六章、結論 59
參考文獻 60
作者簡介 62
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” Proceedings of ACM SIGMOD International Conference on Management of Data, pages 207-216, May 1993.
[2] R. Agrawal, and R. Srikant, “Fast algorithms for mining association rules in large database,” Proceedings of the International Conference on Very Large Data Bases, pages 487-499, September 1994.
[3] Yungho Leu, and Suh-Ying, “An Effective Boolean Algorithm for Mining Association Rules in Large Database,” Proceedings of the International Conference on Database Systems for Advanced Applications, pages 179-186, April 1999.
[4] Han, J., Pei J. and Yin Y., “Mining Frequent Patterns without Candidate Generation,” Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 1-12, May 2000.
[5] R. J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proceedings of ACM SIGMOD International Conference on Management of Data, pages 85-93, 1998.
[6] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proceedings of ACM SIGMOD International conference on Management of Data, pages 255-264, May 1997.
[7] Zijain Z., Ron K., and Llew M., “Real World Performance of Association Rule Algorithms,” Proceedings of ACM-SIGKDD International Conference on Management of Data, pages 401-406, 2001.
[8] J. S. Park, M. S. Chen, and P. S. Yu. “An Effective Hash-Based Algorithm for Mining Association Rules,“ Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 175-186, 1995.
[9] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proceedings of the International Conference on Very Large Data Bases, pages 432-444, 1995.
[10] Mohammed J. Z., Srinivasan P., Wei Li, and Mitsunori O., “Evaluation of sampling for data mining of association rules,” Technical Report 617, Computer Science Dept., U. Rochester.
[11] R. Agarwal, C. Agarwal, and V. V. V. Prasad, “A tree projection algorithm for generation of frequent itemsets,” Journal of Parallel and Distributed Computing, vol 6, num 3, pages 350-371, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔