跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.90) 您好!臺灣時間:2025/01/21 19:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊哲偉
研究生(外文):Che-wei Yang
論文名稱:分散式系統下使用分時方法進行頻繁模式挖掘
論文名稱(外文):Distributed Frequent Pattern Mining Using Time-slice Method
指導教授:吳帆吳帆引用關係
指導教授(外文):Fan Wu
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊管理所暨醫療資訊管理所
學門:教育學門
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:36
中文關鍵詞:頻繁樣式Tidset平行式
外文關鍵詞:Parallel.Frequent patternTidset
相關次數:
  • 被引用被引用:0
  • 點閱點閱:300
  • 評分評分:
  • 下載下載:18
  • 收藏至我的研究室書目清單書目收藏:1
在資料庫裡找尋頻繁樣式項目集對於組合規則挖掘領域ARM是非常普遍的。有很多演算法被提出來找尋頻繁樣式,像是Apriori或是FPgrowth之類的演算法。不過他們無法應用在現今的年代,因為現今的資料庫成長的太快,甚至連FP-tree都塞不進記憶體。因此,在後來很多平行和分散式的演算法被提出來解決此種問題。不過這些演算法都需要透過掃描資料庫才能取得非頻繁的樣式。因此,有人提出了Tidset演算法。
使用tid而不掃描資料庫,交易的次數與內容可以直接取得。雖然Tidset改善了資料交換的效率,但很多的議題包含了這一個並沒有研究花了多少處理時間在他們第一次掃描資料庫上。我們提出time-slice 方法在處理頻繁的項目送至中央節點的這一個階段。另外,在樹的交換階段,各地方的節點可以從中央節點取得頻繁項目以及他們的次數,而從其它的地方節點取得非頻繁項目和次數。在這篇論文,我們不僅能提升中央節點的效能也能避免廣播所帶來的成本。
Mining frequent pattern itemsets in databases is popular in the mining for association rules. There were many methods such as Apriori-like or FPgrowth-like algorithms proposed to find frequent patterns. But both of them cannot be applied nowadays because database grows extremely fast that even the FP-tree cannot entirely fit into main memory. Hence, many parallel and distributed algorithms were later proposed to solve such problems. But they require obtaining infrequent patterns by scanning the disk. Hence, the Tidset method based on Parallel FP-tree algorithm was then proposed. With Tidset instead of scanning database, the number of times of occurrences can be obtained directly. Although Tidset improves the efficiency of information exchange, lots of paper including this one didn’t study on how much process time spent on scanning the database in its first scan. This study proposes time-slice method in the stage when frequent itemsets are collected to the central node. Furthermore, in the tree exchange stage, local nodes obtain the frequent items and counts from central node, and obtain infrequent items from other local nodes. In this case, we not only improve the central node’s utility rate but also avoid broadcasting transactions cost.
Chapter 1 Introduction 6
1.1 Background 6
1.2 Motivation 7
1.3 The concept of our method 8
Chapter 2 Related Work 10
2.1 Apriori algorithm 10
2.2 Frequent pattern growth algorithm 11
2.3 Parallel data mining 14
2.4 Parallel FP-tree algorithm 16
2.5 Tidset based FP-tree algorithm 17
Chapter 3 Our method 19
3.1 Basic Notation 19
3.2 Time slice transmitting 20
3.2.1 Transmitting to central node 20
3.2.2 Getting request message 21
3.2.3 Modification of request message 24
3.2.4 Transaction sets 24
3.3 Transactions with array list 26
Chapter 4 Experimental Results 33
Chapter 5 Discussion 38
Chapter 6 Conclusion and Future Work 39
Reference 40
[1]Agrawal, R. and Srikant, R. “Fast Algorithms for Mining Association Rules.” VLDB, 1994: p. 489-499.
[2]Agrawal, R. and J.C. Shafer, “Parallel Mining of Association Rules.” IEEE Transactions on Knowledge and Data Engineering 1996. 8(6): p. 962-969.
[3]David W. Cheung, Jiawei Han, Vincent T. Ng, Ada W. Fu and Yongjian Fu, “A Fast Distributed Algorithm for Mining Association Rules.” in Proceedings of the fourth international conference on on Parallel and distributed information systems 1996.
[4]Zhou, J. and K.-M. Yu, “Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters in Advances in Grid and Pervasive Computing.” 2008. p. 18-28.
[5]Han, J., J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation.” ACM SIGMOD Record 2000. 29(2): p. 1-12.
[6]Jiawei Han, Jian Pei, Yiwen Yin and Runying Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach” Data Mining and Knowledge Discovery, 2004. 8(1): p. 53-87.
[7]Javed, A. and A. Khokhar, “Frequent Pattern Mining on Message Passing Multiprocessor Systems” Distributed and Parallel Databases, 2004: p. 321-334.
[8]Bo He, Yue Wang, Wu Yang and Yuan chen, “Fast Algorithm for Mining Global Frequent Itemsets Based on Distributed Database”, in Rough Sets and Knowledge Technology. 2006. p. 415-420.
[9]Schuster, A. and R. Wolff, “Communication-Efficient Distributed Mining of Association Rules,” Data Mining and Knowledge Discovery, 2004. 8: p. 171-196.
[10]Dora Souliou, Aris Pagourtzis and Nikolaos Drosinos, “Computing frequent itemsets in parallel using partial support trees.” Journal of Systems and Software, 2006. 79(12): p. 1735-1743.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top