跳到主要內容

臺灣博碩士論文加值系統

(54.91.62.236) 您好!臺灣時間:2022/01/18 00:14
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡紋富
研究生(外文):Tsai Wun-Fu
論文名稱:以最小完美雜湊與資料修剪策略發掘關聯法則之研究
論文名稱(外文):A Minimal Perfect Hashing and Pruning Approach for Mining Association Rules
指導教授:黃國禎黃國禎引用關係
指導教授(外文):Hwang Gwo-Jen
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
中文關鍵詞:資料採擷關聯式法則最小完美雜湊修剪資料庫策略電子商務遠距教學
外文關鍵詞:data miningassociation rulesfrequent itemsetsminimal perfect hashingpruning stategye-commercedistance learning
相關次數:
  • 被引用被引用:1
  • 點閱點閱:445
  • 評分評分:
  • 下載下載:76
  • 收藏至我的研究室書目清單書目收藏:2
資料採擷(Data Mining)已逐漸受到各領域專家學者的重視了,而其中一個更受到各方熱烈討論重要的議題,就是「關聯式法則」(Association Rules)。「關聯式法則」在前幾個階段的候選組合產生,往往成為是否能改善資料採擷效能的關鍵點,特別是在找出frequent 2-itemsets的時候。在本篇論文中,我們提出一個結合最小完美雜湊和修剪資料庫策略的資料採擷演算法,MPHP(Minimal Perfect Hashing & Pruning),可在不需要產生Candidate 2-itemsets和額外的資料庫掃瞄,而直接產生Frequent 2-itemsets。由於MPHP所需的雜湊空間大小只與資料庫中的資料項目數量有關,所以MPHP特別適合用來處理含有大量交易筆數資料的大型資料庫。在評比的實驗中,當交易筆數範圍增加到1,000,000筆時,實驗結果顯示MPHP確實比其他演算法具有更好的效能。
Data-mining techniques have attracted the attention of researchers from various areas. One of the most important data-mining issues is the mining of association rules. It has been shown that the initial candidate set generation, especially for the frequent 2-itemsets, is the key issue to improve the performance of data mining. In this paper, a data-mining algorithm, MPHP, based upon a minimal perfect hashing and pruning strategy scheme is proposed. MPHP directly generates frequent 2-itemsets without need of extra database scan and L1*L1 concatenation, and hence improves the performance and stability. As the hashing space of MPHP is only related to the number of distinguishable data items in the database, MPHP is especially powerful for handling very large databases with huge amount of transactions. Some experiments have been made on the databases with the number of transactions ranging from 100,000 to 1,000,000. The experimental results show that MPHP can achieve better performances than previously proposed methods.
摘要 ………………………………………………………………………………. Ⅰ
Abstract …………………………………………………………………………. Ⅱ
誌謝 ………………………………………………………………………………. Ⅲ
Contents ..………………………………………………………………………. Ⅳ
Figure & Table Contents ………………………………………………………. Ⅵ
1. Introduction …………………………………………………………………… 1
2. Review of Previous Works …………………………………………………. 5
2.1 Problem Description …………………………………………………… 6
2.2 Apriori Algorithm ……………………………………………………….. 6
2.2.1 Hash-Tree: Search & Count Data Structure …………………… 9
2.2.2 The C1 to L2 Flow Chart of Apriori …………………………… 11
2.3 DHP (Direct hashing & pruning) Algorithm ………………………. 12
2.3.1 The C1 to L2 Flow Chart of DHP ………………………………… 14
2.4 Minimal Perfect Hashing Function with Chinese Remainder Theorem …………………………………………………………………….. 16
2.5 Bottlenecks of Finding Frequent Itemsets In Early Phase …….. 22
3. AN EFFICIENT ALGORITHM WITH MINIMAL PERFECT HASHING AND PRUNING SCHEME …………………………………………………………. 24
3.1 MPHP (Minimal Perfect Hashing & Pruning) Algorithm ………… 24
3.2 An Illustrative Example of Generating 2-itemsets Hash Function in MPHP ……………………………………………………………………... 32
3.3 Pruning Strategy in MPHP …………………………………………… 35
4. EXPERIMENTS AND RESULTS …………………………………………... 38
4.1 Generation of Experimental Data …………………………………... 38
4.2 Comparison of MPHP, DHP and Apriori Algorithm ……………. 40
4.3 Experiments with Increasing Number of Transactions and Items …………………………………………………………………………. 45
4.4 Improving Performance with Updated Transactions …………… 47
4.5 MPHP+ Algorithm (MPHP with DHP Pruning) ……………..……... 50
5. MEMORY ISSUES IN MPHP AND DHP …………………………………… 52
6. CONCLUSIONS AND FUTURE WORKS …………………………………. 55
REFERENCE …………………………………………………………………….. 57
Figures & Tables Contents
Fig. 2-1 An illustrative example a database with four transactions ……... 6
Fig. 2-2 Generating candidate itemsets and frequent itemsets by Apriori ……………………………………………………………………………… 7
Fig. 2-3 Construct the C2 Hash-Tree ………………………………………… 10
Fig. 2-4 Search and count on hash tree …………………………………… 10
Fig. 2-5 C1 to L2 flow chart of Apriori ………………………………………… 11
Fig. 2-6 An example of generating candidate 2-itemsets by DHP ……… 14
Fig. 2-7 C1 to L2 flow chart of DHP …………………………………………… 15
Fig. 3-1 The general MPHP algorithm ……………………………………….. 26
Fig. 3-2 An example of generating frequent 2-itemsets by MPHP ……… 34
Fig. 3-3 C1 to L2 flow chart of MPHP …………………………………………. 35
Fig. 3-4 An illustrative example of reducing transactions by MPHP …… 37
Fig. 4-1(a) Experiment on T5.I2.D100k ………………………………………. 40
Fig. 4-1(b) Experiment on T10.I4.D100k ……………………………………. 41
Fig. 4-1(c) Experiment on T20.I6.D100k ……………………………………. 41
Fig. 4-2 The comparison between MPHP and DHP with increasing transactions ……………………………………………………………………… 43
Fig. 4-3 The comparison between MPHP and DHP with increasing items ………………………………………………………………………………. 44
Fig. 4-4 The performance of MPHP with increasing number of transactions ………………………………….………………………………….. 45
Fig. 4-5 The performance of MPHP with increasing number of distinguishable data items ……….…………………………………………… 46
Fig. 4-6 MPHP performance for various database sizes with 100,000 transaction updating operations …………………………………………… 48
Fig. 4-7 The performance of MPHP on two databases with increasing number of update operations ……………………………………………….. 49
Fig. 4-8 The comparison of MPHP with different pruning strategies ….. 50
Fig. 5-1 Memory needed by H2 for N ranging from 1000-10000 ………… 52
Table. 2-1 A hash table of the example given in Fig. 2-6 with minimal perfect hash scheme ………………………………………………….……….. 17
Table. 3-1 All possible 3-itemsets in Fig. 2-1 ……………………………….. 29
Table 3-2 2-itemsets hash table based on MPHP scheme ……………….. 32
[1] R. Agrawal, T. Imielinski, and A.swami, “Mining Association Rules between the Sets of Items in Large Database,” Proc. ACM SIGMOD, pp. 207-216, May 1993.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference Santiago, Chile , 1994.
[3] R. Agrawal and J.C. Shafer, “Parallel mining of association rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 962 —969, 1996.
[4] Charu C. Aggarwal and Philip S. Yu, “A New Approach to Online Generation of Association Rules,” Knowledge and Data Engineering, IEEE Transactions, Vol. 13, Issue: 4 , pp.527 —540, 2001
[5] Hussien H. Aly, Ashraf A. Amr, and Y. Taha, “Fast Mining of Association Rules in Large-Scale Problems,” Computers and Communications, 2001. Proceedings. Sixth IEEE Symposium , pp.107 —113, 2001
[6] “ARMiner Project,” http://www.cs.umb.edu/~laur/ARMiner/
[7] Michael J.A. Berry and Gordon Linoff, “Data Mining Techniques: For Marketing, Sales, and Customer Support,” Wiley Computer Publishing, 1997
[8] C.C. Chang, “The Study of an Ordered Minimal Perfect Hashing Scheme,” Communications of the ACM, Vol. 27, No. 4, pp. 384-387, 1984.
[9] C.C. Chang, “Apply Chinese Classical mathematics to Information Technology,” Information and Education magazine, 1991.8.
[10] D.W. Cheung, V.T. Ng, A.W. Fu and Yongjian Fu, “Efficient mining of association rules in distributed databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 911 —922, 1996.
[11] M.S. Chen, J. Han and P.S. Yu, “Data mining: an overview from a database perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866 —883, 1996.
[12] “Data Mining,” http://www.anderson.ucla.edu/faculty/jason.frand/teacher
/technologies/palace/index.htm
[13] E.H. Han, G.. Karypis and V. Kumar, “Scalable parallel data mining for association rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, pp. 337 -352, 2000.
[14] J. Han and Y. Fu, “Mining multiple-level association rules in large databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 5, pp. 798 -805, 2000.
[15] J. Han, Laks V.S. Lakshmanan and Raymond T. Ng, “Constraint-based, Multidimensional Data Mining,” IEEE Computer, August 1999, pp. 46-50.
[16] J. Hipp, U. Guntzer, and G. Nakhaeizadeh, “Algorithms for Association Rule Mining- Ageneral Survey and Comparison”, ACM SIGKDD Explorations, Vol.2, Issue 1, pp.58-64, 2000
[17] M. Houtsma and A. Swami, “Set-Oriented Mining for Association Rules in Relational Databases,” Proc. of 11th International Conference on Data Engineering, pp. 25-33, Mar. 1995.
[18] B. Liu, W. Hsu, S. Chen and Y. Ma, “Analyzing the subjective interestingness of association rules,” IEEE Intelligent Systems, Vol. 15, No. 5, pp. 47 —55, 2000.
[19] “IBM QUEST Group,” http://www.almaden.ibm.com/cs/quest/HOME.html
[20] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” 7th International Conference on Database Theory, 1999
[21] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications,” ACM SIGMOD Record, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Vol. 27, Issue 2, 1998
[22] J. Soo, M.S. Chen and Philip S. Yu, “Using a Hash-Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules,” IEEE Trans. On Knowledge and Data Engineering, Vol.9, No.5, pp.813-825, 1997.
[23] W.F. Tsai and G.J. Hwang, “Using Minimal Perfect Hashing Algorithm to Improve Mining Performance,” Proceedings of Workshop on the 21st Century Digital Life and Internet Technologies, 2001
[24] “Tutorial on High Performance Data Mining”, http://www-users.cs.umn.edu/~mjoshi/hpdmtut/index.htm
[25] W. Wang, J. Yang, and Philip S. Yu, “Efficient Mining of Weighted Association Rules(WAR),” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000
[26] M. J. Zaki, “Generating Non-Redundant Association Rules,” 6th ACM SIGKDD International Conference on knowledge discovery and data mining, 2000
[27] M. J. Zaki, S. P. W. Li, and M. Ogihara, “Evaluation of Sampling for Data Mining of Association Rules,” Research Issues in Data Engineering, 1997. Proceedings. Seventh International Workshop, pp.42 —50, 1997
[28] M. J. Zaki and C.J. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining”, Technical Report, RPI, 1999
[29] M.J. Zaki, “Scalable algorithms for association mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, pp. 372 —390, 2000
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 吳淑美(民86):融合式班級設立之要件。特教新知通訊,4卷8期,1-2頁。
2. 邱上真(民80):學習策略教學的理論與實際。特殊教育與復健學報,1期,1-49頁。
3. 楊紹旦(民70):行動研究法及其實例。國教輔導月刊,20卷4期,頁5-7。
4. 吳淑美(民84):完全包含(Full inclusion)模式可行嗎?特教新知通訊,3卷3期,1-2頁。
5. 吳武典(民87)教育改革與特殊教育。載於國立教育資料館編印:教育改革專輯,教育資料集刊23輯,197-220頁。
6. 江芳盛(民87)從政策研究的觀點看美國教育改革。暨大學報,2卷1期,253-271頁。
7. 王天苗(民90):運用教學支援建立融合教育的實施模式:以一公立幼稚園的經驗為例。特殊教育研究學刊,21期,27-51頁。
8. 游家政(民88):九年一貫課程綱要總綱的理念與架構。教師天地,102期, 34-41頁。
9. 邱上真(民90):普通班教師對特殊需求學生之因應措施、所面對之困境以及所需之支持系統。特殊教育研究學刊,21期,1-26頁。
10. 游家政(民89):學校課程的統整及其教學。課程與教學,3卷,1期, 19-37頁。
11. 甄曉蘭(民84):合作行動研究(Cooperative Action Inquiry)-進行教育研究的另一種方式。嘉義師院學報,9期, 297-318頁。
12. 何華國(民89)澳洲特殊學生之融合教育。嘉義大學學報,69期,161-181頁。
13. 廖鳳池(民79):行動研究法簡介。諮商與輔導,60期, 5-9頁。
14. 歐用生(民89)九年一貫課程改革的經驗。國民教育,40卷,4期,2-9頁。
15. 蔣明珊、盧台華(民89):國小資優教材評鑑檢核表建構與試用之研究。特殊教育研究學刊,19期,347-370頁。