跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.85) 您好!臺灣時間:2024/12/07 01:37
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃正男
研究生(外文):Jheng-Nan Huang
論文名稱:在資料探勘中頻繁項目集的精簡表示法
論文名稱(外文):A Study on Compact Representation of Frequent Itemsets in Data Mining
指導教授:洪宗貝洪宗貝引用關係江明朝
指導教授(外文):Tzung-Pei HongMing-chao Chiang
學位類別:博士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:105
語文別:英文
論文頁數:103
中文關鍵詞:封閉項目集最大項目集頻繁項目集資料挖掘近似支持度
外文關鍵詞:closed itemsetsmaximal itemsetsfrequent itemsetdata miningapproximate support
相關次數:
  • 被引用被引用:1
  • 點閱點閱:353
  • 評分評分:
  • 下載下載:31
  • 收藏至我的研究室書目清單書目收藏:0
在資料探勘中找出頻繁項目集對於產生關聯規則而言,是一個非常關鍵的步驟。若最小支持度被設定在一個非常小的數值,或者被處理的交易資料庫的交易項目非常多的時候,頻繁項目集的數目常可能會相當巨大,因此在過去有一些精簡表示的方法被提出,以有效紀錄頻繁項目集。例如,最大項目集的方法用保存頻繁與非頻繁項目集間的分隔線方式,以減少保存頻繁項目集所需的資訊。此種方法可回復所有頻繁項目集,但卻無法得知這些頻繁項目集的支持度。相較於最大項目集,封閉項目集的做法則可以同時回復頻繁項目集及其支持度,但所需之保存數量較多。因此在本論文中,我們提出一種介於最大項目集和封閉項目集之間的近似表示方式,其能正確回復完整的頻繁項目集,並能計算出這些頻繁項目集的近似支持度。我們也提出並推導了這個近似表示法的一些特性。此外我們利用兩種不同的資料結構來組織頻繁項目集並分別提出了兩個不同的方法來求其近似表示。第一個方法採用前綴樹的結構,而第二個方法則使用了流量網路的方式。最後我們也針對不同的資料集及變換多個參數值來驗證這兩個方法的效能,而實驗結果也顯示我們所提的方法和封閉項目集相比,確實可以達到較佳的壓縮比率。
In data mining, finding frequent itemsets is a critical step for deriving association rules. The number of frequent itemsets may be huge if the minimum support is set at a low value or if the number of items in a transaction database is large. In the past, some approaches were proposed to record frequent itemsets with compact representation. For example, the approach of maximal itemsets keeps a borderline composed of the maximal itemsets, which separate frequent itemsets from non-frequent ones. It can recover all the frequent itemsets, but cannot get their real supports back. On the contrary, the approach of closed itemsets can correctly recover each frequent itemset and its support. In the thesis, we propose approximate representation of derived frequent itemsets, which lies between maximal itemsets and closed itemsets. It can correctly get back all the derived itemsets and can calculate their approximate supports. Some properties with the representation are also derived. Two methods are then proposed to get the approximate representation by organizing frequent itemsets with two special data structures. The first one adopts a prefix tree, and the other utilizes a flow network. Finally, some experiments on different datasets and with a variety of parameter values are conducted to show the performance of the two methods. The results reflect the proposed methods have better compact rates than the closed itemsets.
誌謝 + i
中文摘要 + iii
ABSTRACT + iv
Chapter 1 INTRODUCTION + 1
1.1. Background and Motivation + 1
1.2. Thesis Organization + 4
Chapter 2 REVIEW OF RELATED WORKS + 5
2.1. Maximal Itemsets + 5
2.2. Closed Itemsets + 6
2.3. Other Concepts + 8
Chapter 3 THE METHOD BASED ON PREFIX TREE + 11
3.1. Term Definitions + 11
3.2. Problem Statement + 18
3.3. Proposed Concepts and Methods + 22
3.4. Experiments + 38
3.4.1. Experimental Datasets and Criteria + 38
3.4.2. Experimental Results + 39
Chapter 4 THE METHOD BASED ON FLOW NETWORK + 50
4.1. Term Definitions + 50
4.1.1. Flow Network + 51
4.1.2. Lattice of Frequent Itemsets + 52
4.1.3. Simple Flow Network of Frequent Itemsets + 54
4.1.4. Complex Flow Network of Frequent Itemsets + 57
4.2. Preliminary Observations and Problem Statement + 59
4.3. Proposed Concepts and Methods + 65
4.3.1. Setting Capacities of Correcting Edge Pairs + 65
4.3.2. Dividing the itemsets at a Level as Intervals + 69
4.4. Experiments + 77
4.4.1. Experimental Datasets and Criteria + 77
4.4.2. Experimental Results + 79
Chapter 5 DISCUSSION AND CONCLUSION + 85
5.1. Comparison of the Two Proposed Methods + 85
5.2. Comparison of the Two Methods with Closed Sets and Free Sets + 86
5.3. CONCLUSION AND FUTURE WORK + 87
REFERENCES + 88
[1]. Rakesh Agrawal, Tomasz Imieliński and Arun Swami. Mining Association Rules between Sets of Items in Large Databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207-216. 1993.
[2]. Roberto J. Bayardo Jr. Efficiently Mining Long Patterns from Databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 85-93. 1998.
[3]. Richard Bellman. On a Routing Problem. Quarterly of Applied Mathematics, Vol. 16, No. 1, pp. 87-90. 1958.
[4]. Jean-François Boulicaut, Artur Bykowski and Christophe Rigotti. Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries. International Journal of Data Mining and Knowledge Discovery, Vol. 7, Issue 1, pp. 5-22. 2003.
[5]. Toon Calders and Bart Goethals. Non-Derivable Itemset Mining. International Journal of Data Mining and Knowledge Discovery, Vol. 14, Issue 1, pp. 171-206. 2007.
[6]. Varun Chandola and Vipin Kumar. Summarization – Compressing Data into an Informative Representation. International Journal of Knowledge and Information Systems, Vol. 12, Issue 3, pp. 355-378. 2007.
[7]. Manike Chiranjeevi and Om Hari. Modified GUIDE (LM) Algorithm for Mining Maximal High Utility Patterns from Data Streams. International Journal of Computational Intelligence Systems, Vol. 8, Issue 3, pp. 517-529. 2015.
[8]. Arianna Gallo, Tijl De Bie and Nello Cristianini. MINI: Mining Informative Non-Redundant Itemsets. Proceedings of the 11th Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 438-435. 2007.
[9]. Jiawei Han, Jian Pei and Yiwen Yin. Mining Frequent Patterns without Candidate Generation. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1-12. 2000.
[10]. Jochen Hipp, Ulrich Güntzer and Gholamreza Nakhaeizadeh. Algorithms for Association Rule Mining - A General Survey and Comparison. ACM SIGKDD Explorations Newsletter, Vol. 2, Issue 1, pp. 58-64. 2000.
[11]. Matthijs van Leeuwen and Antti Ukkonen. Discovering Skylines of Subgroup Sets. International Journal of Machine Learning and Knowledge Discovery in Databases, Vol. 8190, pp. 272-287. 2013.
[12]. Ming-Yen Lin, Tzer-Fu Tu and Sue-Chen Hsueh. High Utility Pattern Mining Using the Maximal Itemset Property and Lexicographic Tree Structures. International Journal of Information Sciences, Vol. 215, pp. 1-14. 2012.
[13]. Junqiang Liu, Yunhe Pan, Ke Wang and Jiawei Han. Mining Frequent Item Sets by Opportunistic Projection. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 229-238. 2002.
[14]. Michael Mampaey, Nikolaj Tatti and Jilles Vreeken. Tell Me What I Need to Know: Succinctly Summarizing Data with Itemsets. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 573-581. 2011.
[15]. Kleanthis-Nikolaos Kontonasios and Tijl DeBie. Formalizing Complex Prior Information to Quantify Subjective Interestingness of Frequent Pattern Sets. Proceedings of the 11th International Conference on Advances in Intelligent Data Analysis, pp. 161-171. 2012.
[16]. Fatemeh Nori, Mahmood Deypir and Mohamad Hadi Sadreddini. A Sliding Window Based Algorithm for Frequent Closed Itemset Mining over Data Streams. The Journal of Systems and Software, Vol. 86, Issue 3, pp. 615-623. 2013.
[17]. Matthijs van Leeuwen and Arno Knobbe. Diverse Subgroup Set Discovery. International Journal of Data Mining and Knowledge Discovery, Vol. 25, Issue 2, pp. 208-242. 2012.
[18]. Jefrey Lijffijt, Panagiotis Papapetrou and Kai Puolamäki. A Statistical Significance Testing Approach to Mining the Most Informative Set of Patterns. International Journal of Data Mining and Knowledge Discovery, Vol. 28, Issue 1, pp. 238-263. 2012.
[19]. Y. Okada, T. Tada, K. Fukuta and T. Nagashima. Audio Classification based on a Closed Itemset Mining Algorithm. International Conference on Computer Information Systems and Industrial Management Applications. 2010.
[20]. Nicolas Pasquier, Yves Bastide, Rafik Taouil and Lotfi Lakhal. Discovering Frequent Closed Itemsets for Association Rules. Proceedings of the 7th International Conference on Database Theory, pp. 398-416. 1999.
[21]. Jian Pei, Guozhu Dong, Wei Zou and Jiawei Han. Mining Condensed Frequent-Pattern Bases. International Journal of Knowledge and Information Systems, Vol. 6, Issue 5, pp. 570-594. 2004.
[22]. Matteo Riondato and Eli Upfal. Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees. ACM Transactions on Knowledge Discovery from Data, Vol. 8, Issue 4, Article No. 20. 2014.
[23]. Prabha S, Shanmugapriya S and Duraiswamy K. A Survey on Closed Frequent Pattern Mining. International Journal of Computer Applications, Vol. 63, Number 14, pp. 47-52. 2013.
[24]. Nikolaj Tatti and Michael Mampaey. Using Background Knowledge to Rank Itemsets. International Journal of Data Mining and Knowledge Discovery, Vol. 21, Issue 2, pp. 293-309. 2010.
[25]. Yun Unil, Lee Gangin and Ryu Keun Ho. Mining Maximal Frequent Patterns by Considering Weight Conditions over Data Streams. International Journal of Knowledge-Based Systems, Vol. 55, pp. 49-65. 2014.
[26]. Jianyong Wang and George Karypis. On Efficiently Summarizing Categorical Databases. International Journal of Knowledge and Information Systems, Vol. 9, Issue 1, pp. 19-37. 2006.
[27]. Geoffrey I. Webb. Self-Sufficient Itemsets: An Approach to Screening Potentially Interesting Associations between Items. ACM Transactions on Knowledge Discovery from Data, Vol. 4, Issue 1, Article No. 3. 2010.
[28]. Geoffrey I. Webb and Jilles Vreeken. Efficient Discovery of the Most Interesting Associations. ACM Transactions on Knowledge Discovery from Data, Vol. 8, Issue 3, Article No. 15. 2014.
[29]. Yang Xiang, Ruoming Jin, David Fuhry and Feodor F. Dragan. Summarizing Transactional Databases with Overlapped Hyperrectangles. International Journal of Data Mining and Knowledge Discovery, Vol. 23, Issue 2, pp. 215-251. 2011.
[30]. Dong Xin, Jiawei Han, Xifeng Yan and Hong Cheng. Mining Compressed Frequent-Pattern Sets. Proceedings of the 31st International Conference on Very Large Data Bases, pp. 709-720. 2005.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊