跳到主要內容

臺灣博碩士論文加值系統

(44.210.21.70) 您好!臺灣時間:2022/08/11 16:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:江俊彥
論文名稱:應用分群法提昇關聯法則效率之研究
論文名稱(外文):A study of improving association rule efficiency with clustering method
指導教授:蔡玉娟蔡玉娟引用關係
學位類別:碩士
校院名稱:國立屏東科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:中文
論文頁數:102
中文關鍵詞:資料探勘關聯法則分群法Table
外文關鍵詞:data miningassociation rulesclusteringtable
相關次數:
  • 被引用被引用:18
  • 點閱點閱:693
  • 評分評分:
  • 下載下載:193
  • 收藏至我的研究室書目清單書目收藏:4
  企業的資料量隨著時間的累積而快速成長,龐大的資料中隱藏著許多有用的資訊,而透過資料探勘技術發掘的資訊,可以作為企業決策時的參考。關聯法則是資料探勘技術中最被廣為研究與使用的方法,其雖可在大型資料庫中發掘項目組合而建立關聯規則,但其執行程序受限於必須由單一項目集開始,逐層擴展才能找出所有的高頻項目集;其主要缺點為必須經過反覆之資料庫掃描與產生大量的候選項目集,而造成執行效率不彰。針對此點本研究提出一個改良的演算法,稱為TBCP (Table-Based Clustering and Pruning) 關聯法則演算法,此方法著重於效能的提昇,僅需掃描資料庫一次,將資料庫經過前置處理後,修剪掉沒有用的交易記錄,減少大量記憶體的消耗,再根據交易記錄長度將資料庫分群並存入分群Table中,比對時只需利用分群的特性進行刪減的處理,可以在不與所有交易記錄進行比對的狀態下,提早判斷與刪除不必要的候選項目集,減少掃描資料庫的次數與比對時間,且又能確保探勘結果的完整性與正確性。經實驗數據顯示,我們的方法在執行效率上有明顯的提昇,在相同的資料庫或支持度的條件下,與Apriori演算法的執行效率比較最多有十倍以上的差異,當長度越長的時候,其所需要掃描的交易記錄就越少,改善的幅度就愈大,且我們的方法在支持度變化時,執行效率仍可維持穩定的狀況。
  Discovery of association rules is an important and popular data mining task. Some algorithms have been proposed to solve this problem. But they contain weaknesses such as often requiring repeated passes over the database, and generated a large number of the candidate itemsets, and so on. In this paper, we present a new algorithm for efficient association rule mining to overcome the above drawbacks. Our algorithm is named Table-Based Clustering and Pruning (TBCP) association rule. The first step in TBCP is to generate the frequent 1-itemsets and prune the database by frequent 1-itemsets. The second step is to create the clustering table by scanning the database once and cluster the transaction records to the k-th clustering table, which the length of record is k. The main characteristics of TBCP are only scan the database once, and contrast with the clustering table. It not only reduces the number of the data scans and requires less memory under ensure the correctness of the mined results. Experiments with real-life databases show that TBCP outperforms Apriori, a well-known and widely used association rule.
目  錄
中文摘要 I
英文摘要 II
誌謝 III
目錄 IV
圖目錄 VI
表目錄 VIII
第一章 緒論 1
1.1研究背景與動機 2
1.2研究目的 7
1.3研究流程 8
1.4論文組織 10
第二章 文獻探討 11
2.1資料探勘 11
2.1.1知識挖掘與資料探勘 12
2.1.2資料探勘的相關技術與應用 15
2.2分群法 18
2.3關聯法則 19
2.3.1 關聯法則的定義 20
2.3.2 Apriori關聯法則演算法 21
2.3.3關聯法則的相關應用 26
2.4關聯法則的相關研究 27
2.4.1 Partition演算法 28
2.4.2 Dynamic Itemset Count(DIC)演算法 30
2.4.3 Sampling演算法 32
2.4.4 Column-Wise Apriori演算法 32
2.4.5 Direct Hashing and Pruning (DHP)演算法 33
2.4.6 Pincer-Search 演算法 36
2.4.7 多重最小支持度演算法 37
第三章 研究方法 39
3.1 TBCP 關聯法則演算法 40
3.2 TBCP 關聯法則演算法的實作 51
3.3 TBCP關聯法則演算法之範例推導 59
第四章 實驗成果 64
4.1實驗平台與測試資料庫 64
4.2實驗設計 67
4.3數據分析與效能評估 70
第五章 結論與未來研究方向 92
5.1結論 92
5.2未來研究方向 94
參考文獻 96
作者簡介 102
參考文獻
中文部分:
1. 王皓正,『時間序列資料之查詢與資料發掘-以台灣股市為例』,國立臺灣大學資訊管理研究所碩士論文,民國八十九年。
2. 李秀梅,『信用卡持卡者資料探勘之研究』,輔仁大學應用統計學研究所碩士論文,民國八十九年。
3. 李姿儀,『醫院門診資料探勘─以虎尾若瑟醫院為例』,南華大學資訊管理研究所碩士論文,民國八十九年。
4. 吳恆睿,『中醫院揀藥儲位規劃之研究』,逢甲大學工業工程研究所碩士論文,民國八十九年。
5. 周卓定,『資料探勘應用於醫療院所輔助病患看診指引之研究』,南華大學資訊管理研究所碩士論文,民國八十九年。
6. 林亮德,『消費者創新產品之採用行為與產品屬性評估之研究─以數位影音光碟機(DVD)為例』,中央大學資訊管理研究所碩士論文,民國八十八年。
7. 胡運沛,『探勘交易觸發事件與消費特徵』,雲林科技大學資訊管理研究所碩士論文,民國八十九年。
8. 徐家馴,『在教學網站的環境中發掘熱門學習路線』,輔仁大學資訊工程研究所碩士論文,民國八十九年。
9. 張忠琦,『資料探勘於網站使用度與網頁內容探索之研究』,輔仁大學資訊工程研究所碩士論文,民國八十九年。
10. 黃彥文,『資料探勘之應用-會員消費特徵之發掘』,國立屏東科技大學資訊管理研究所碩士論文,民國八十八年。
11. 黃勝崇,『資料探勘應用於醫療院所輔助病患看診指引之研究』,南華大學資訊管理研究所碩士論文,民國八十九年。
12. 潘瓊如,『時間性路徑追蹤樣式之資料挖掘』,國立臺灣大學 資訊管理研究所碩士論文,民國八十九年。
英文部分:
1. R. Agrawal, T. Imilienski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," In Proc. of the ACM SIGMOD Int''l Conf. on Management of Data, pp. 207-216, May 1993.
2. R. Agrawal and R. Srikant, "Fast Algorithm for Mining Association Rules in Large Databases," In Proc. 1994 Int''l Conf. VLDB, pp. 487-499, Santiago, Chile, Sep. 1994.
3. R. Agrawal, R. Srikant: "Mining Sequential Patterns," In Proc. of the Int''l Conference on Data Engineering (ICDE), Taipei, Taiwan, March 1995. Expanded version available as IBM Research Report RJ9910, Oct. 1994.
4. F. Berzal, J.C. Cubero, N. Marin, J.M. Serrano, "TBAR: An efficient method for associatin rule mining in relational databases," Elserier, Data & Knowledge, Engineering 37, 47-64, 2001.
5. S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, "Dynamic Itemset Counting and Implication Rules for Market Basket Data," ACM SIGMOD Conference on Management of Data, pp. 255-264, 1997.
6. S. Brin, R. Motwani, and C. Silverstein, "Beyond Market Baskets: Generalizing Association Rules to Correlations," 1997 ACM SIGMOD Conference on Management of Data, pp. 265-276, 1997.
7. P. Cabena, P. Hadjinian, R. Stadler, J. Verhees and A. Zanasi, "Discovering Data Mining From Concept to Implementation," Prentice-Hall Inc., 1997.
8. C. Carter, H. Hamilton, N. Cercone, "Share Based Measures for Itemsets," Principles of Data Mining and Knowledge Discovery, J. Komorowski and J. Zytkow (Eds.), pp.14-24, 1997.
9. M.S. Chen, J. Han, and P.S. Yu, "Data Mining: An Overview from a Database Perspective," IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, 1996.
10. M.S. Chen, J.S. Park and P. S. Yu, ''''Efficient Data Mining for Path Traversal Patterns,'''' IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, pp. 209-221, April 1998.
11. D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu, "A Fast Distributed Algorithm for Mining Association Rules," In Proc. of 1996 Int''l Conf. on PDIS''96, Miami Beach, Florida, USA, Dec.1996.
12. B. Dunkel and N. Soparkar, "Data Organization and Access for Efficient Data Mining," ICDE, Sydney, Australia, 1999.
13. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and Uthurusamy, "From data mining to knowledge discovery: An overview," Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, pp.1-35, 1996.
14. Y. Fu, "Data mining task, technique and applications," IEEE POTENTIALS, 1997.
15. J. Han and M. Kamber, "Data Mining: Concepts and Techniques," Morgan Kaufmann Publishers, 2000.
16. J. Han and Y. Fu, "Discovery of Multiple-Level Association Rules from Large Databases," In Proc. of the 21st VLDB Conference, Zurich, Swizerland, pp.420-431, 1995.
17. E. H. Han, G. Karypis, V. Kumar, and B. Mobasher, "Clustering Based On Association Rule Hypergraphs," SIGMOD''97 Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997.
18. IBM, "IBM Intelligent Miner For Data," 1998.
19. L. Kaufman and P. J. Rousseeuw, "Finding Groups in Data: an Introduction to Cluster Analysis," John Wiley & Sons, 1990.
20. P. Kotala, A. Perera, J. Kai Zhou, S. Mudivarthy, W. Perrizo, and E. Deckard, "Gene Expression Profiling of DNA Microarray Data using Peano Count Tree (P-Trees)," In Proc. of the Virt. Conf. in Genom. and Bioinf. North Dakota State University, USA, Oct.15-16, 2001.
21. B. Liu, W. Hsu and Y. Ma, "Mining Association Rules with Multiple Minimum Supports," In Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.337-341, 1999.
22. H. Mannila and P. Ronkainen, "Similarity of Event Sequences (revised version)," In Proc. of the Fourth International Workshop on Temporal Representation and Reasoning (TIME''97), 10th - 11th May, 1997, Daytona Beach, Florida, USA, pp. 136-139, 1997.
23. O. Maron and A. L. Ratan, "Multiple-Instance Learning for National Scene Classification," In Proc. of 15th International Conference on Machine Learning, Madison, Wisconsin, USA, 1998.
24. META Group, "Data Warehouse Marketing Trends /Opportunities: An In-Depth Analysis of Key Market Trends," META Group, Jan. 1998.
25. J. S. Park, M. S. Chen and, P. S. Yu, "An Effective Hash Based Algorithm for Mining Association Rules," ACM SIGMOD, pp.175-186, 1995.
26. A. Savasere, E. Omiecinski, and S. Navathe, "An Efficient Algorithm for Mining Association Rules in Large Databases," In Proc. of 21th VLDB, pp.432-444, 1995.
27. R. Srikant and R. Agrawal, "Mining Generalized Association Rules," In Proc. of the 21st Int''l Conference on VLDB, pp. 407-419, Zurich, Switzerland, Sep. 1995.
28. H. Toivonen, "Sampling Large Databases for Association Rules," VLDB, pp. 134-145, 1996.
29. S. M. Weiss and C. A. Kulikowski, "Computer System that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert System," Morgan Kaufman, 1991.
30. X. Xu, M. Ester, H. P. Kriegel, and J. Sander, "A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases," In Proc. of 14th International Conference on Data Engineering, pp.324-331, 1998.
31. O. R. Zaiane, M. Xin, J. Han. "Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs," In Proc. Advances in Digital Libraries Conf. (ADL''98), pp.19-29, Santa Barbara, CA, April 1998.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top