跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/08/01 13:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張韋豪
研究生(外文):Wei-Hao Chang
論文名稱:從資料流中探勘頻繁封閉項目集與不冗餘關聯規則之有效率的演算法
論文名稱(外文):An Efficient Algorithm for Mining Frequent Closed Itemsets and Non-Redundant Association Rules in Data Streams
指導教授:顏秀珍顏秀珍引用關係
指導教授(外文):Show-Jane Yen
學位類別:碩士
校院名稱:銘傳大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:97
中文關鍵詞:封閉項目集不冗餘關聯規則關聯規則探勘資料流
外文關鍵詞:Association rules miningData streamsClosed itemsetsNon-redundant association rules
相關次數:
  • 被引用被引用:0
  • 點閱點閱:194
  • 評分評分:
  • 下載下載:26
  • 收藏至我的研究室書目清單書目收藏:0
關聯規則探勘 (Association Rules Mining) 是要從交易資料庫 (Transaction Database) 中分析出常常一起被購買的商品以及顧客購買某些商品也可能會購買其它某些商品的行為。傳統探勘關聯規則的演算法所探勘的交易資料都是固定的,當新的交易資料不斷產生以及舊有的交易資料依使用者需求而移除時,必須連同原始資料重新探勘,浪費許多時間重新找尋舊有的資訊。而這種隨時間不斷產生新的資料與移除舊有的資料稱之為資料流 (Data Streams) 。因此,有些學者提出從資料流的環境下,找出所有的頻繁項目集與關聯規則。然而所產生的項目集與關聯規則的數量可能非常龐大,太多資訊反而會造成使用者的困擾,無法做決策。因此近年來有學者提出封閉項目集 (Closed itemsets) 的觀念,若一項目集的支持度計數 (Support count) 比其所有超集合 (Superset) 的支持度計數都大,則此項目集為封閉項目集,由頻繁封閉項目集產生的關聯規則稱為不冗餘關聯規則 (Non-redundant association rules) ,由頻繁封閉項目集與不冗餘關聯規則可得到所有頻繁項目集與關聯規則的資訊。在這篇論文中,我們提出一個有效率的演算法 FCIAR ,當資料新增或被移除時,不需要掃描原始資料庫,只需將新增或被移除的資料與之前產生的封閉項目集做運算,就可產生更新資料庫後的封閉項目集和不冗餘關聯規則。在實驗的部分,我們會與同樣在資料流中探勘頻繁封閉項目集的 CFI-Stream 演算法做執行時間與所使用空間上的比較。經由實驗可以證實本研究所提出的演算法相當有效率。
It can analyze customer’s behaviors who bought products from transaction databases by association rules mining. Traditional algorithms for association rules mining can mine the frequent itemsets over static transactional databases. When we insert a new transaction or delete a transaction, we have to remine the entire updated database. It’s costly in both time and space requirement. Data comes with high speed, unbound, and continuous. So some scholars propose an idea to find all frequent itemsets and association rules in data streams. However, the amount of frequent itemsets and association rules are large. Too much information will confuse users and doesn’t make a good decision. So some scholars present the concept of saving closed itemsets recently. If their supports are differences between the itemset and its all supersets, the itemset is a closed itemset. We can get non-redndant association rules by frequent closed itemsets. Then we can get all frequent itemsets and association rules by frequent closed itemsets and non-redundant association rules. We propose an algorithm FCIAR in the paper. When transactions are added or deleted, we just make some operations with transactions and closed itemsets produced without scanning original transaction databases. Then we can produce new closed itemsets and non-redundant association rules for update transaction databases. We will compare FCIAR with CFI-Stream in execution time and memory usage. It proves that our approach is very efficient by the result of experiments.
中文摘要 ii
英文摘要 iv
致 謝 vi
目錄 vii
表目錄 viii
圖目錄 ix
第一章 緒論 1
第二章 相關工作 5
第三章 FCIAR 演算法 18
3.1 封閉項目集產生器 22
3.1.1 封閉項目集產生器實例說明 24
3.1.2 封閉項目集產生器虛擬碼 25
3.2 FCIAR-Add 演算法 27
3.2.2 FCIAR-Add 虛擬碼 39
3.3 FCIAR-Del 演算法 42
3.3.1 FCIAR-Del 虛擬碼 46
3.4 從頻繁封閉項目集產生不冗餘關聯規則 47
3.4.1 產生不冗餘關聯規則虛擬碼 48
3.5 FCIAR 演算法資料儲存結構 49
1. 雜湊表的設計 49
2. 項目編號鏈結 50
3. 使用雜湊表應用於封閉項目集產生器 50
第四章 實驗結果與效能比較 53
4.1 人造資料集實驗結果 54
4.2 真實資料集實驗結果 70
4.3 人造資料集 Scalable 實驗結果 77
4.4 產生不冗餘關聯規則測試 79
第五章 結論與未來研究工作 81
參考文獻 82
1.R. Agrawal and R. Srikant (1994), Fast Algorithm for Mining Association Rules, In Proceedings of International Conference on Very Large Data Bases, 1994, pages 487–499.
2.R. Agrawal and R. Srikant (1995), Mining Sequential Patterns, Proceeding of International Conference on Data Engineering, pages 3–14.
3.Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz (2004), Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window, Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM''04), pp. 59-66.
4.J. Han, J. Pei, and Y. Yin (2000), Mining Frequent Patterns without Candidate Generation, Proc. 2000 ACM-SIGMOD Int’l Conf. Management of Data (SIGMOD ’00), May, pp. 1–12.
5.Jiawei Han, Jianyong Wang, Ying Lu and Tzvetkov P. (2002), Mining top-k frequent closed patterns without minimum support, Proceedings of 2002 IEEE International Conference on Data Mining (ICDM), 2002, pp. 211-218.
6.N. Jiang and L. Gruenwald (2006), CFI-Stream: Mining Closed Frequent Itemsets in Data Streams, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 592–597.
7.R. Jin, and G. Agrawal (2005), An Algorithm for In-Core Frequent Itemset Mining on Streaming Data, Proc. of the 5th IEEE Int''l Conf. on Data Mining, pp. 210–217.
8.H. F. Li, S. Y. Lee, and M. K. Shan (2004), An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams, in the Proc. of First International Workshop on Knowledge Discovery in Data Streams, to be held in conjunction with the 15th European Conference on Machine Learning (ECML-2004) and the 8th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD).
9.G.S. Manku and R. Motwani (2002), Approximate Frequency Counts over Data Streams, Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357.
10.Pasquier Nicolas, Bastide Yves, Taouil Rafik and Lakhal Lotfi (1999), Efficient mining of association rules using closed itemset lattices, Proceedings of Information Systems, 1999, pp. 25-46.
11.J.S. Park, M.S. Chen, and P.S. Yu (1995), An Effective Hash Based Algorithm for Mining Association Rules, ACM International Conference on Management of Data (SIGMOD), pages 175–186.
12.N. Pasquier, Y. Bastide, R. Taouil and L. Lakhal (1999), Discovering Frequent Closed Itemsets for Association Rules, Proc. of the 7th Int. Conf. on Database Theory, pp. 398–416.
13.J. Pei, J. Han, and R. Mao (2000), CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, Proc. of ACM SIGMOD Int. Workshop on Data Mining and Knowledge Discovery, pp. 21–30.
14.J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang (2001), H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases, Proceedings IEEE International Conference, pp. 441–448.
15.J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu (2001), PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, Proc. 2001 Int’l Conf. Data Eng. (ICDE ’01), pp. 215–224.
16.Songram P., Boonjing V. and Intakosum S. (2006), Closed Multidimensional Sequential Pattern Mining, Proceedings of Third International Conference on Information Technology: New Generations (ITNG), 2006, pp. 512-517.
17.Singh N.G., Singh S.R. and Mahanta, A.K. (2005), CloseMiner: discovering frequent closed itemsets using frequent closed tidsets, Proceedings of Fifth IEEE International Conference on Data Mining (ICDM), 2005.
18.Boonjing Veera and Songram Panida (2007), Efficient Algorithms for Mining Closed Multidimensional Sequential Patterns, Proceedings of Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2007, pp. 749-753.
19.J. Wang, J. Han, and J. Pei (2003), CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets, Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 236–245.
20.Hai Wang, Wenyuan Li, Zeng-zhi Li and Lin Fan (2005), Finding Closed Itemsets in Data Streams, Proceedings of Knowledge-Based Intelligent Information and Engineering Systems, 2005, pp. 964-971.
21.M.J. Zaki and C.J. Hsiao (1999), CHARM: An Efficient Algorithm for Closed Association Rule Mining, In Technical Report 99-10, Computer Science, Rensselaer Polytechnic Institute, pp. 1–24.
22.M.J. Zaki and C.J. Hsiao (2002), CHARM: An Efficient Algorithm for Closed Itemset Mining, Proc. of the SIAM Int. Conf. on Data Mining, pp. 99.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top