跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/02 12:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李家宏
研究生(外文):Jia-Hong Li
論文名稱:雲端運算之關聯式規則資料探勘技術
論文名稱(外文):Using MapReduce Framework for Mining Association Rules
指導教授:陳世穎陳世穎引用關係
指導教授(外文):Shih-Ying Chen
學位類別:碩士
校院名稱:國立臺中科技大學
系所名稱:資訊工程系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:62
中文關鍵詞:排容原理關聯式規則資料探勘MapReduce雲端運算
外文關鍵詞:inclusion-exclusion principleassociation rulesdata miningMapReducecloud computing
相關次數:
  • 被引用被引用:1
  • 點閱點閱:469
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
由於科技的進步,電腦硬體與網路技術快速的發展,相關應用的需求也快速增加,雲端運算技術的研究逐漸被重視。關聯式規則在資料探勘裡面是一個重要的議題,探勘關聯式規則大多數的目的是要找出商品項目之間的關聯,提供商家一些有用的決策。關聯式規則中,大部分時間都耗費在計算頻繁項目集上,目前已經有許多演算法被提出。探勘關聯式規則在面對龐大的資料量時,單機電腦中可能無法有效地探勘。關聯式規則演算法中,其中一個稱為Principle of Inclusion-Exclusion and Transaction Mapping高效率演算法,它結合了Apriori演算法的候選項目集的事先篩選優點,以及FP-growth演算法只掃描兩次資料庫的優點,而且PIETM利用排容原理計算所有的頻繁項目集。我們為了能夠處理大量資料的關聯式規則資料探勘,本篇提出一個基於MapReduce架構的PIETM平行演算法,將原本的PIETM演算法可平行計算更大的交易資料庫。實驗顯示,針對MapReduce的參數適當的調整,PIETM平行演算法對於更大的交易資料庫的計算,是非常可行的。

With the rapid development of computer hardware and network technologies, people may gain the demand for the related applications. Cloud computing has become a very popular research area recently. An association rules in data mining which plays important role in cloud computing technology.
An association rule is useful for discovering relationships among different products and further provides beneficial decision to policy-market. In association rules, computation load in discovering all frequent itemsets from transaction database is considerably high. Some researchers have shown that such mining big data on a single machine may cause computation infeasible and ineffective.
Principle of Inclusion-Exclusion and Transaction Mapping benefits from two famous algorithms – Apriori and FP-Growth. Apriori benefits by join and prune the candidate itemsets. FP-Growth scans database twice only. PIETM mine frequent itemsets recursively by Principle of Inclusion-Exclusion.
To achieve the application of processing big data, this paper present a novel PIETM algorithm based on Map-Reduce framework for parallel processing suitable for the application of big transaction database.
The experimental results show that after re-adjust the parameter in MapReduce framework, the proposed PIETM algorithm is efficient in the application of processing big data.


中文摘要 i
英文摘要 ii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 ix
一. 緒論 1
1.1 研究背景 1
1.2 研究目的 2
1.3 論文架構 3
二. 相關研究 4
2.1 MapReduce 4
2.2 關聯式規則 7
2.2.1 Apriori演算法 7
2.2.2 FP-growth演算法 10
2.2.3 Online Combinatorial Approximation(OCA)演算法 15
2.2.4 Principle of Inclusion-Exclusion and Transaction Mapping 17
三. 問題分析與方法設計 23
3.1 問題分析 23
3.1.1 PIETM在單機計算的限制 23
3.1.2 PIETM平行化的問題 24
3.2 研究方法 26
3.2.1 Distributed Cache 26
3.2.2 平行化PIETM演算法 27
四. 範例說明 37
4.1 第一階段:計算第一階頻繁項目集 37
4.2 第二階段:建立交易間隔表 39
4.3 第三階段:迭代探勘各階頻繁項目集 41
五. 實驗結果 44
5.1 實驗環境 44
5.2 實驗設計 45
5.3實驗結果 45
5.3.1 不同最低支持度的效能 45
5.3.2 不同交易資料庫特性的效能 47
5.3.3交易間隔表MapReduce的執行時間與探勘總時間之比較 52
5.3.4 不同Reduce數量對建立交易間隔表的影響 52
5.3.5 不同Map個數的影響 53
5.3.6 調整第二階頻繁項目集MapReduce的Map數量對於執行時間的影響 55
六. 實驗分析與討論 57
七. 結論及未來工作方向 59
參考文獻 60


[1]R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules in Large Database,” In Proceeding of the 20th International Conference on VLDB, Pages 487-499, 1994.
[2]J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” in Communications of the ACM, Volume 51, Issue 1,Pages 107-113, 2008.
[3]K. Emoto, H. Imachi, “Parallel Tree Reduction on MapReduce,” in Procedia Computer Science, Volume 9, Pages 1827-1836, 2012.
[4]S. Ghemawat, H. Gobioff and S. T. Leung, “The Google File System,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles, Pages 29-43, 2003.
[5]J. C. Mason and D. C. Handscomb, “Chebyshev Polynomials,” Chapman and Hall, 2002.
[6]J. Han, J. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, Pages 1-12, 2000.
[7]J. W. Han, “Data Mining Concepts & Techniques,” Morgan Kaufmann, 2007.
[8]J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” in Data Mining Knowledge Discovery, Volume 8, Issue 1, Pages 53-83, 2004.
[9]K. F. Jea, M. Y. Chang, and K. C. Lin, “An Efficient and Flexible Algorithm for Online Mining of Large Itemset,” in Information Processing Letters, Volume 92, Issue 6, Pages 311-316, 2004.
[10]K. C. Kulkarmi, R. S. Jagale and S. M. Rokade, “A Survey on Apriori algorithm using MapReduce Technique,” in International Journal of Emerging Technology and Advanced Engineering, Volume 3, Issue 4, Pages 24-32, 2013.
[11]K. C. Lin, I. E. Liao, S. F. Lin, and T. P. Chang, “A Frequent Itemset Mining Algorithm based on the Principle of Inclusion-Exclusion and Transaction Mapping,” in Information Sciences, revised.
[12]H. Li, Y. Wang, M. Zhang and E. Y. Chang, “Pfp: parallel fp-growth for query recommendation” in the ACM Conference Series on Recommender Systems, Pages 107-114, 2008.
[13]N. Linial and N. Nisan, “Approximate inclusion–exclusion,” in Combinatorica, Volume 10, Pages 349-365, 1990.
[14]C. L. Liu, “Introduction to Combinatorial Mathematics,” McGraw-Hill, New York, 1968.
[15]Y. Othman, H. Othman and E. Ehab, “An Efficient Implementation of Apriori,” in International Journal of Reviews in Computing, Volume 12, Pages 59-67, 2012.
[16]T. White, "Hadoop: The Definitive Guide," O''Reilly Media, 2010.
[17]L. Zhou, Z. Zhong, J. Chang, J. Li, J. Z. Huang and S. Z. Feng, “Balanced parallel FP-Growth with MapReduce,” in IEEE Youth Conference on Information Computing and Telecommunications, Pages 243-246, 2010.
[18]D. Zinn, S. Bowers, S. Köhler, and B. Ludäscher, “Parallelizing XML data-streaming workflows via MapReduce,” in Journal of Computer and System Sciences, Volume 76, Pages 447-463, 2010.
[19]Hadoop Distributed Cache, http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html.
[20]Apache HDFS, http://hadoop.apache.org/docs/stable/hdfs_design.html.
[21]Apache hadoop, http://hadoop.apache.org.
[22]Apache Hbase, http://hbase.apache.org/.
[23]Apache Hive, http://hive.apache.org/.
[24]Apache pig, http://pig.apache.org/.
[25]7 Tips for Improving MapReduce Performance. http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/.
[26]Association rule learning, http://en.wikipedia.org/wiki/Association_rule_learning.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊