跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/02 12:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張志宏
研究生(外文):Zhi-Hong Chang
論文名稱:MapReduce架構中支援大規模資料之結合運算
論文名稱(外文):Join Operations for Large-Scale Data on MapReduce Framework
指導教授:陳世穎陳世穎引用關係
學位類別:碩士
校院名稱:國立臺中科技大學
系所名稱:資訊工程系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:65
中文關鍵詞:HadoopMapReduceθ-Join星型結合
外文關鍵詞:HadoopMapReducetheta-joinstar-join
相關次數:
  • 被引用被引用:0
  • 點閱點閱:251
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
由於硬體技術與網路技術的進步,以及各類應用的需求,雲端計算成為一個重要的研究方向;其中大規模資料處理是一個重要的問題。大規模資料處理通常會使用平行資料處理架構(data-parallel framework)來處理,因為這種架構(framework)可以有效率的應付大規模的資料,並且常被用於資料探勘與資料倉儲的大規模資料處理。MapReduce是此架構的一種,它透過Map與Reduce兩個階段來達成工作。其中Scatter-Gather-Merge(SGM)是基於MapReduce架構上支援資料倉儲當中最常見的星型結合的高效率演算法。然而SGM只有支援equi-join,本文將基於設計支援θ-Join(即equi- 與nonequi- join)的處理架構,以支援更多查詢類型與應用。此外由於nonequi-join的運算會有大規模的資料輸入與輸出,本文提出的方法TJSGM與PRM亦設計在MapReduce的架構下藉著減少負載平衡可以解決大規模資料傳輸的問題,進而提升查詢效能。實驗結果顯示,在equi-join 下,我們提出的方法跟SGM擁有相似的執行效率。而在nonequi-join下,我們也展示了不同的選擇比例下的執行效能。

As the rapid development of hardware and network technology, cloud computing has become an important research topic. It provides a solution for large-scale data processing problems. The data-parallel framework provides a platform to deal with large-scale data, especially for data mining and data warehousing. MapReduce is one of the most famous data-parallel frameworks. It consists of two stages: the Map stage and the Reduce stage. Based on the MapReduce framework, Scatter-Gather-Merge (SGM) is an efficient algorithm supporting star join queries, which is one of the most important query types in the data warehouse. However, SGM only supports equi-join operations. This thesis proposes a framework, which supports not only equi-join operations but also nonequi-join operations. And, the nonequi-join processing usually causes a large amount of I/Os, the proposed method can resolve this problem by reducting the cost of load balancing. Our experimental results show that, for equi-join operations, our method has similar execution time compared to SGM. For non-equi join operations, we also illustrate the performances under different conditions.

中文摘要 I
英文摘要 II
誌 謝 III
目 錄 IV
表目錄 VII
圖目錄 VIII
一、 緒論 1
1.1 研究背景 1
1.2 研究目的 2
1.3 論文架構 2
二、 相關研究 4
2.1 HADOOP MAPREDUCE 4
2.2 KEY PARTITION FUNCTION 6
2.2.1 HashPartitioner 7
2.2.2 TotalOrderPartitioner 8
2.2.3 KeyFieldBasedPartitioner 9
2.3 BLOOM FILTER 11
2.4 MULTIWAY JOIN 12
2.5 SCATTER-GATHER-MERGE機制 14
2.5.1 SGM的鍵值操縱(keymanipulation) 14
2.5.2 SGM工作流程 15
三、 問題與方法 16
3.1 問題分析 16
3.2 方法設計 19
3.2.1 鍵值操縱(key manipulation) 20
3.2.2 Distributed cache 21
3.2.3 覆寫自訂鍵值的equals函式 23
3.2.4 小結 25
四、 TJSGM(THETA-JOIN SCATTER-GATHER-MERGE)機制 26
4.1 設計原理 26
4.2 設計鍵值分區函式(KEY PARTITION FUNCTION) 26
4.3 FILTER階段 28
4.4 SCATTER-AND-GATHER階段 30
4.5 MERGE階段 35
4.6 AGGREGATE階段 36
五、 PRM(PARTITION-REPLICATION-MERGE)機制 38
5.1 設計原理 38
5.2 設計鍵值分區函式(KEY PARTITION FUNCTION) 39
5.3 FILTER階段 40
5.4 PARTITION-AND-REPLICATION階段 42
5.5 MERGE階段 44
5.6 AGGREGATE階段 45
六、 範例說明 46
6.1 TJSGM範例 46
6.2 PRM範例 48
七、 實驗結果與分析 51
7.1 實驗環境 51
7.2 STAR SCHEMA BENCHMARK 52
7.3 實驗結果 53
7.3.1 Equi-join查詢效能 53
7.3.2 不同區塊大小對equi-join的影響 55
7.3.3 不同Reducer數量對equi-join的影響 56
7.3.4 Theta-join查詢效能 58
八、 結論 62
參考文獻 63

一、英文參考文獻
[1]D. J. Abadi, P. A. Boncz, and S. Harizopoulos, “Column-Oriented Database Systems,” in Proceedings of the VLDB Endowment, Volume 2, Issue 2, Pages 1664-1665, 2009.
[2]A. Abelló, J. Ferrarons and O. Romero, “Building Cubes with MapReduce,” in Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP, Pages 17-24, 2011.
[3]F. N. Afrati and J. D. Ullman, “Optimizing Joins in a Map-Reduce Environment,” in Proceedings of the 13th International Conference on Extending Database Technology, Pages 99-110, 2010.
[4]F. N. Afrati and J. D. Ullman, “Optimizing multiway joins in a map-reduce environment,” IEEE Transactions on Knowledge and Data Engineering, Volume 23, Issue 9, Pages 1282-1298, 2011.
[5]B. Bloom, “Space/time trade-offs in hash coding with allowable errors,” in Communications of the ACM, Volume 13, Issue 7, Pages 422-426, 1970.
[6]J. Chandar, “Join Algorithms using Map/Reduce,” http://www.inf.ed.ac.uk/publications/thesis/online/IM100859.pdf, 2010.
[7]S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” in ACM SIGMOD Record, Volume 26, Issue 1, Pages 65-74, 1997.
[8]S. Chen, “Cheetah: A high performance, custom data warehouse on top of mapreduce,” in Proceedings of the VLDB Endowment, Volume 3, Issue 1-2, Pages 1459-1468, 2010.
[9]J. Dean and S. Ghemawat, “MapReduce: Simplied Data Processing on Large Clusters,” in Communications of the ACM, Volume 51, Issue 1, Pages 107-113, 2008.
[10]S. Ghemawat, H. Gobioff. and S. T. Leung, “The Google File System,” in Proceedings of the nineteenth ACM symposium on Operating systems principles, Pages 29-43, 2003.
[11]Y. Gu and R. L. Grossman, “Sector and Sphere: the design and implementation of a high-performance data cloud,” Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, Volume 367,Issue 1897, Pages 2429-2445, 2009.
[12]H. Han, H. Jung, H. Eom and H. Y. Yeom, “Scatter-Gather-Merge: An Efficient Star-Join Query Processing Algorithm for Data-Parallel Frameworks,” Cluster Computing, Volume 14, Issue 2, Pages 183-197, 2011.
[13]M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,” in Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, Pages 59-72, 2007.
[14]D. Jiang, A. K. H. Tung, and G. Chen, “Map-join-reduce: Towards scalable and efficient data analysis on large clusters,” IEEE Transactions on Knowledge and Data Engineering, Volume 23, Issue 9, Pages 1299-1311, 2011.
[15]P. O’Neil, G. Graefe, “Multi-Table Joins through Bitmapped Join Indices,” in ACM SIGMOD Record, Volume 24, Issue 3, Pages 8-11, 1995.
[16]P. O’Neil, B. O’Neil and X. Chen, “Star Schema Benchmark,” http://www.cs.umb.edu/~poneil/StarSchemaB.PDF , 2007.
[17]A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wychoff and R. Murthy, “Hive - a warehousing solution over a map-reduce framework,”in Proceedings of the VLDB Endowment, Volume 2, Issue 2, Pages 1626-1629, 2009.
[18]H. C. Yang, A. Dasdan, R. L. Hsiao, D. S. Parker, “Map-Reduce-Merge Simplified Relational Data Processing on Large Clusters,” Proceedings of the ACM SIGMOD international conference on Management of data, Pages 1029-1040, 2007.
[19]J. Venner, “Pro Hadoop,” Apress, 2009.
[20]T. White, “Hadoop: The Definitive Guide,” O''Reilly Media, 2010.
二、網頁參考文獻
[21]Apache hadoop, http://hadoop.apache.org
[22]Apache hive, http://hadoop.apache.org/hive
[23]Apache pig, http://pig.apache.org/


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊