(3.236.214.19) 您好!臺灣時間:2021/05/10 04:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:李孝屏
研究生(外文):Hsiao-Ping Lee
論文名稱:適用於靜態與動態資料庫的支持度導向關聯規則探勘方法之研究
論文名稱(外文):The Study on Support-Oriented Mining of Association Rules for Static and Dynamic Databases
指導教授:楊東麟楊東麟引用關係
指導教授(外文):Don-Lin Yang
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2001
畢業學年度:89
語文別:中文
論文頁數:128
中文關鍵詞:靜態資料探勘動態資料探勘關聯規則支持度導向
外文關鍵詞:Static data miningDynamic data miningAssociation ruleSupport orientationDatabase
相關次數:
  • 被引用被引用:1
  • 點閱點閱:570
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:20
  • 收藏至我的研究室書目清單書目收藏:2
電子商務的蓬勃發展,伴隨著顧客導向的商業趨勢,有效的預測消費心理與消費傾向,迎合消費者的消費習慣與需求,是提高營業收益的絕佳方法。為此,發展適用於靜態、動態探勘的高效能演算法,有效率地自歷史資料中找出有價值的關聯規則,並隨著資料的異動而動態地修正其內容,建構互動式的資料探勘環境,在競爭激烈的今日,已成為資料探勘相關研究中,頗受矚目的課題之一。
在探勘關聯規則的過程中,高頻項目集之搜尋是其中相當重要的一個程序,也是決定探勘表現的關鍵所在。傳統以Apriori為代表的高頻項目集探勘演算法,往往因其演算流程上的限制,存在著缺乏調適性、暫時結果不具實質意義以及因牽涉過多磁碟I/O而影響執行效能等缺點。另一方面,由於它們並未將項目集支持度的精神融入演算架構中,在動態探勘的應用上亦存在著缺乏效率且不易實現等問題。
在本研究中,我們提出了CPOG與ESO兩個高效能的探勘演算法,以期提供高頻項目集搜尋之課題的最佳解決方案。CPOG具有架構簡單、易於了解與實現,以及低磁碟I/O成本等特點,是一個適用於高支持度門檻限制環境的靜態探勘演算法,它利用物件間自然存在的階層關係,建構非高頻項目集的刪除機制,並透過數學模型的建立、定理的應用,搭配以集合論定理為基礎而設計的項目集支持度之計算方法,有效地提升整體的探勘效能。ESO是一個以項目集支持度為考量的支持度導向探勘演算法,可適用於靜態/動態資料或支持度變動頻繁之環境的探勘任務。它除了承襲CPOG的各項優點外,更進一步地利用已取得的部份結果為輔助,逐步完成使用者的探勘要求,讓計算與磁碟I/O成本能得到最佳的平衡。支持度導向的特性,讓ESO在執行時期所產生的部份結果擁有支持度層面的實質意義與可利用性,更使它在動態資料探勘的應用上,具備良好的彈性與適用性。
為了進一步地驗證演算法的執行效能,我們以模擬資料、實際的壽險保單和WWW日誌資料為標的,分別進行了數項實驗。由實驗所得結果可以看出,CPOG確實是個表現優異的靜態探勘演算法,而ESO不僅具備良好的靜態探勘能力,更可在開放且動態的探勘應用上,有效的達成互動式探勘的目標,其應用範圍可涵蓋電子商務與計算生物學...等在內的眾多領域與課題。
With the vast growth of e-commerce, it is more profitable to promote product sales based on customer-oriented analysis with individual buying habits and personal favorites that can be catered for targeted consumers. In recent years, the fierce competition of the e-business makes the task of developing suitable static and dynamic data mining with high performance a must. Using these efficient algorithms in an interactive discovery environment, one can extract valuable knowledge of association rules from various historical databases in an effective way. The evolution of mining new rules or modifying existing rules dynamically with the update of databases has also emerged as an attractive issue in the related study of data mining.
The most critical procedure in rule extraction is to identify large itemsets. It is also a key factor affecting the performance of mining association rules. Existing algorithms of finding large itemsets, such as Apriori, have many drawbacks including the lack of flexibility, adaptiveness, and usefulness of intermediate results in the mining process. These algorithms also involve too many disk I/Os in the discovery of large itemsets. Furthermore, they are inefficient and hard to adopt in the application of dynamic mining since the concept of support is not employed in their algorithmic architectures.
In this study, we propose two efficient algorithms, CPOG and ESO, to provide the best solutions of discovering large itemsets. CPOG has a simple algorithmic structure that is very easy to understand and implement. With a low cost of disk I/Os, it is an efficient algorithm suitable for static mining with high support thresholds. CPOG fully utilizes the inherent hierarchy of transaction items to provide a pruning mechanism of infrequent itemsets, applies mathematical models and formulas, and employs the support counting method based on the set theory to improve the efficiency of discovery process.
ESO is a support-oriented mining algorithm, which is mainly focused on the utilization of the support of itemsets. It is suitable for the environment that requires static, dynamic databases and provides varying thresholds of support. ESO doesn’t only inherit the advantages of CPOG, but also further the use of known support thresholds to meet the extraction requirement of user-defined minimum supports in a progressive manner. It makes the costs of support counting and disk I/O well balanced. The support-oriented property of ESO makes it possible to use the intermediate results from the execution steps with respect to known support thresholds. This property also makes ESO more flexible and adaptive in the dynamic mining.
To further evaluate the performance of our proposed algorithms, extensive experiments were conducted by using simulated databases, real insurance databases, and WWW log databases in the study. Based on the results from our mining experiments, CPOG is indeed a performance-excellent discovery algorithm for static databases. The ESO algorithm is not only very good at static mining, but also can be employed in the open and dynamic discovery application to perform interactive mining efficiently. The CPOG and ESO algorithms can be used in many research and commercial applications, such as e-commerce and computational biology, etc.
第1章 導論1
第2章 相關研究9
2.1 靜態資料探勘演算法9
2.1.1 Apriori演算法9
2.1.2 DHP演算法11
2.1.3 DIC演算法13
2.2 動態資料探勘演算法14
2.2.1 漸進式的資料探勘方法14
2.2.2 線上資料探勘演算法16
2.2.2.1 使用項目集絡的線上探勘方法17
2.2.2.2 使用縮減項目集絡的線上探勘演算法18
第3章 CPOG與ESO演算法21
3.1 使用之名詞、縮寫符號與定理22
3.1.1 名詞與縮寫符號之定義22
3.1.2 定理及其證明26
3.2 CPOG演算法34
3.2.1 用以提升探勘效能的機制35
3.2.1.1 GOP filter35
3.2.1.2 高頻項目集最大長度的預測37
3.2.1.3 簡化的項目集組合程序38
3.2.1.4 "長項目集優先"的項目集支持度計算方法39
3.2.2 CPOG演算法的流程與說明40
3.2.3 範例說明43
3.3 支持度導向的ESO演算法47
3.3.1 支持度導向之探勘演算法的概念與架構48
3.3.2 ESO演算法的架構52
3.3.3 靜態資料庫的探勘程序53
3.3.4 針對動態資料庫而進行的維護程序58
3.3.5 範例說明63
3.3.6 知識資料庫的縮減70
第4章 實驗結果與討論73
4.1 實驗環境與模擬資料的產生73
4.2 演算法效能評估的結果與探討77
4.2.1 CPOG演算法的探勘效能77
4.2.1.1 統計數值與討論78
4.2.1.2 時間效能與討論80
4.2.2 ESO演算法的探勘效能81
4.2.2.1 靜態資料探勘之時間效能與相關統計數值的討論82
4.2.2.2 支持度變動環境下的探勘實驗86
4.2.2.3 動態環境的資料探勘實驗87
4.3 演算法特性之量測的實驗結果與探討91
4.4 針對實際資料庫而進行的探勘實驗97
第5章 結論與未來展望102
5.1 結論102
5.2 本研究之貢獻105
5.3 未來展望107
參考文獻111
[1]N. Nayak, K. Bhaskaran, and R. Das, “Virtual enterprises-building blocks for dynamic e-business,” Proceedings of ITVE2001 Workshop on Information Technology for Virtual Enterprises, 2001, pp. 80-87.
[2]N. Juul and C. Loebbecke, “Commercial e-commerce servers and enterprise application integration: a case-based comparison of net.commerce and site server commerce,” Proceedings of the 34th Annual Hawaii International Conference on System Sciences, 2001, pp. 275-275.
[3]G. Fischer and J. Ostwald, “Knowledge management: problems, promises, realities, and challenges,” IEEE Intelligent Systems, Vol. 16, 2001, pp. 60-72.
[4]K. Joshi, “A framework to study knowledge management behaviors during decision making,” Proceedings of the 34th Annual Hawaii International Conference on System Sciences, 2001, pp. 102-102.
[5]J. Srivastava and Ping-Yao Chen, “Warehouse creation-a potential roadblock to data warehousing,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, Jan.-Feb. 1999, pp. 118-126.
[6]T. Govindaraj, E. E. Blanco, D. A. Bodner, M. Goetschalckx, L. F. McGinnis, and G. P. Sharp, “An on-line tutor for warehouse design,” IEEE International Conference on Systems, Man, and Cybernetics, Vol. 2, 2000, pp. 1158-1161.
[7]L. Bellatreche, K. Karlapalem, and M. Mohania, “OLAP query processing for partitioned data warehouses,” Proceedings of 1999 International Symposium on Database Applications in Non-Traditional Environments(DANTE ''99), 1999, pp. 35-42.
[8]M. J. Pazzani, “Knowledge discovery from data,” IEEE Intelligent Systems, Vol. 15, March-April 2000, pp. 10-12.
[9]C. Olaru and L. Wehenkel, “Data mining,” IEEE Computer Applications in Power, Vol. 12, July 1999, pp. 19-25.
[10]R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” Proceedings of the ACM SIGMOD Conference on Management of Data, 1993, pp. 207-216.
[11]S. M. Weiss and C. A. Kulikowski, “Computer System that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert System,” Morgan Kaufman, 1991.
[12]L. Kaufman and P. J. Rousseeuw, “Finding Groups in Data: an Introduction to Cluster Analysis,” John Wiley & Sons, 1990.
[13]R. Agrawal and R. Srikant, “Mining Sequential Patterns,” IEEE ICDE, 1995, pp. 3-14.
[14]Bing Liu, Wynne Hsu, Shu Chen, and Yiming Ma, “Analyzing the subjective interestingness of association rules,” IEEE Intelligent Systems, Vol. 15, Sept.-Oct. 2000, pp. 47-55.
[15]J. Seitzer, J. P. Buckley, and Y. Pan, “INDED: a distributed knowledge-based learning system,” IEEE Intelligent Systems, Vol. 15, Sept.-Oct. 2000, pp. 38-46.
[16]Rajamani, B. Iyer, A. Cox, and A. Chadha, “Efficient Mining for Association Rules with Relational Database Systems,” Proceedings of International Symposium on Database Engineering and Applications (IDEAS ''99), 1999.
[17]S. Y. Wru and Y. Leu, “An effective boolean algorithm for mining association rules in large databases,” Proceedings of the 6th International Conference on Database System for Advanced Applications, 1999, pp. 179-186.
[18]P. Frasconi, M. Gori, and G. Soda, “Data categorization using decision trellises,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, Sept.-Oct. 1999, pp. 697-712.
[19]N. Lesh, M. J. Zaki, and M. Oglhara, “Scalable feature mining for sequential data,” IEEE Intelligent Systems, Vol. 15, March-April 2000, pp. 48-56.
[20]K. Wang and H. Q. Liu, “Discovering structural association of semistructured data,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, May-June 2000, pp. 353-371.
[21]Chen, L. Liu, N. Chen, and G. P. Xia, “Discovery of sequential patterns from large database in supply chain,” Proceedings of the 3rd World Congress on Intelligent Control and Automation, Vol. 3, 2000, pp. 1966-1970.
[22]F. Qin and X. B. Yang, “A high efficient algorithm of mining sequential patterns,” Proceedings of the 3rd World Congress on Intelligent Control and Automation, Vol. 5, 2000, pp. 3750-3752.
[23]M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. Verykios, “Disclosure limitation of sensitive rules,” Proceedings of Knowledge and Data Engineering Exchange (KDEX ''99) Workshop on Knowledge and Data Engineering Exchange, 2000, pp. 45-52.
[24]http://cindy.cis.nctu.edu.tw/AI/ai5/Main.html
[25]http://cindy.cis.nctu.edu.tw/EC/FPG/fuzz_po/evolu_frame.htm
[26]Bing Liu, Wynne Hsu, Lai-Fun Mun, and Hing-Yan Lee, “Finding Interesting Patterns Using User Expectations,” IEEE Transactions on Knowledge and Data Engineering, Vol. 11, November-December 1999.
[27]http://cindy.cis.nctu.edu.tw/EC/FPG/fuzz_po/fuzz_frame.htm
[28]http://cindy.cis.nctu.edu.tw/AI96/team11/pg0.htm
[29]E. Ganti, J. Gehrke, and R. Ramakrishnan, “Mining Very Large Databases,” Computer, Vol. 32, Aug 1999, pp. 38-45.
[30]S. Berson and J. Smith, “Data Warehousing, Data Mining, and OLAP,” McGraw-Hill Companys, 1997.
[31]R. Feldman and R. Kashi, “A new and versatile method for association generation,” Principles of Data Mining and Knowledge Discovery, First European Symposium, PKDD ''97, 1997, pp. 221-231.
[32]R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Databases, September 1994.
[33]T. P. Hong, C. Y. Wang, and Y. H. Tao, “Incremental data mining based on two support thresholds,” Proceedings of the Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, Vol. 1, 2000, pp. 436-439.
[34]V. Ganti, J. Gehrke, and R. Ramakrishnan, “DEMON: mining and monitoring evolving data,” Proceedings of the 16th International Conference on Data Engineering, 2000, pp. 439-448.
[35]H. K. Chu and M. H. Wong, “Interactive data analysis on numeric-data,” Proceedings of International Symposium on Database Engineering and Applications, IDEAS ''99, 1999, pp. 226-230.
[36]S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “An efficient algorithm for the incremental updation of association rules in large databases,” Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD97), 1997, pp. 263-266.
[37]W. Cheung, S. D. Lee, and B. Kao, “A general incremental technique for maintaining discovered association rules,” Proceedings of the Fifth International Conference on Database Systems For Advanced Applications, 1997, pp. 185-194.
[38]W. Cheung, J. Han, V. T. Ng, and C. Y. Wong, “Maintenance of discovered association rules in large databases: An incremental updating technique,” Proceedings of the IEEE ICDE''96, 1996, pp. 106-114.
[39]K. K. Ng and W. Lam, “Updating of association rules dynamically,” Proceedings of 1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE ''99), 1999, pp. 84-91.
[40]Hidber, “Online Association Rule Mining,” Proceedings of the ACM SIGMOD, 1999, pp. 145-156.
[41]C. Agrawal and P. S. Yu, “Online generation of association rules,” Proceedings of the IEEE ICDE''98, 1998, pp. 402-411.
[42]T. H. Hsu, “Online Generation of Association Rules in Dynamic Databases,” National Chung Hsin University, Master Thesis, 2000.
[43]M. J. Zaki, “Scalable algorithms for association mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 12, May-June 2000, pp. 372-390.
[44]S. M. Tseng, “Efficient mining of categorized association rules in large databases,” Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Vol. 5, 2000, pp. 3606-3610.
[45]B. Dunkel and N. Soparkar, “Data Organization and Access for Efficient Data Mining,” Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia.
[46]Z. M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” Proceedings of the 6th International Conference on Extending Database Technology, 1998.
[47]J. S. Park, M. S. Chen, and P. S. Yu, “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Transactions on Knowledge and Data Engineering, Vol. 9, September-October 1997.
[48]S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proceedings of 1997 ACM SIGMOD Conference on Management of Data, 1997, pp. 255-264.
[49]H. Toivonen, “Sampling Large Databases for Association Rules,” Proceedings of the VLDB, 1996, pp. 134-145.
[50]Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proceedings of the 21st VLDB, 1995, pp. 432-444.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔