跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/02 12:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張智崴
研究生(外文):Chang, Chih-Wei
論文名稱:基於 Locutus 與 Borg-Tree 的即時性 MapReduce 運算機制
論文名稱(外文):An Adaptive Real-Time MapReduce Framework Based on Locutus and Borg-Tree
指導教授:趙嘉成趙嘉成引用關係
指導教授(外文):Chao, Chia-Cheng
口試委員:趙嘉成陳智凱黃瓊玉
口試委員(外文):Chao, Chia-ChengChen, Chih-KaiHuang, Chiung-Yu
口試日期:2013-01-05
學位類別:碩士
校院名稱:國立臺北教育大學
系所名稱:資訊科學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:46
中文關鍵詞:MapReduce巨量資料分散式運算Borg-Tree
外文關鍵詞:MapReduceBig DataDistributed ComputingBorg-Tree
相關次數:
  • 被引用被引用:0
  • 點閱點閱:325
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
自從2004年Google釋出MapReduce的設計,歷經多年發展,2011年Apache終於釋出Hadoop 1.0版本,MapReduce Open Source的資源雖然已成熟到足以支持企業營運發展與應用,但部分迫切需求的特性卻仍未有所進展:其一是即時性的運算支援,其二便是跨平台的部署與易用性。本論文藉由過去針對Hadoop效能瓶頸的分析提出改良方針,期望能藉此發展一套易於佈屬應用的Real-Time運算平台。
我們首先引用過去MapReduce 的效能瓶頸研究針對 HDFS 存取速度不佳、Zookeeper 協調機制效能議題進行改良,並提出假設方案以 Share Memory 機制取代 HDFS,以 Locutus 取代 Zookeeper 將可大幅提升其效能,至足以支援 Real-Time 的分析應用。
在本論文,我們提出一套基於 Borg-Tree 與 Locutus 為核心的 MapReduce 機制,藉由P2P概念的擴散方式快速進行分散式處理,並以 Node JS 於雲端平台實證之。
本研究於最後藉由實驗法實證此 Prototype 的可行性,雖然本研究最後所得實驗結果未能滿足預期效能,但也指出 Protocol 與 Share Memory 機制的問題瑕疵所在供後續研究發展。
Google has released the design of MapReduce since 2004. After years of development, finally Apache has lunched Hadoop version 1.0 at 2011, and it means the open source resources of MapReduce is enough supporting the applications of business. But somehow, there are still some features unsatisfied for big data processing. First is the supporting of real-time computing, and the other is the cross-platform deployment and ease of use. In this thesis, we analyzed the bottleneck of Hadoop performance and try to solve it, and hoping to develop an easy Real-Time Computing Platform.
We cite the researches that pointed the bottleneck of MapReduce performance were the access speed of HDFS and Zookeeper performance. It means if we could improved the coordination mechanisms and use replace HDFS to other faster Storage mechanism (we use share memory in this way), we could significantly improve its performance enough to support Real-Time analyze applications.
In this paper, we propose the algorithm based on Locutus and Borg-Tree to support the coordination for MapReduce. It is structure by P2P topology that concepts quickly distributed processing. And it was programmed by NodeJS that could be easily deploying to many cloud platform.
We finally took some experiments to solve the feasibility of our prototype. Although we also obtained that this program did not reach the expected performance, but we also pointed out the problem with Share Memory mechanisms and out Protocol for subsequent research and development.
目錄
第一章 緒論 1
1.1 研究動機 1
1.2 研究背景 1
1.3研究目的、範圍與方法 3
1.3.1 研究問題 5
1.3.2 研究範圍 5
1.3.3 研究方法 5
1.3.4 演算法 6
1.3.5 Borg-Tree 拓樸 7
1.3.6 實驗法 7
1.3.7 預期貢獻 7

第二章 文獻探討 8
2.1 MapReduce 架構 8
2.2 Hadoop Shuffle 模型 9
2.3 Hadoop 架構 10
2.4 HDFS 模組 11
2.5 HBASE 模組 13
2.6 MapReduce 型態與格式 14

第三章 專家訪談與系統規劃設計 17
3.1 Zookeeper 解決方案 17
3.2 Locutus 快速協調演算法 19
3.2.1 Borg-Tree 拓樸 19
3.2.2 Borg-Tree建立方式 20
3.3 拓樸特性 24
3.4 Locutus 快速協調演算法 26

第四章 研究發現與討論 30
4.1創新研究與發現 30
4.1.1 Locutus 運作架構 30
4.1.2 TeraSort 演算法與流程 33
4.1.3 High Performance MapReduce設計 36
4.1.4 XTrieTree 分區演算法 37
4.1.5 ETrieTree 分區演算法 38
4.2實驗發現與討論 39
4.2.1實驗內容 39
4.2.2實驗硬體配置 39
4.2.3 實驗結果 40
4.2.4 實驗檢討 40

第五章 結論與未來展望 37
5.1 結論 41
5.2 未來展望 42

參考文獻 41


表目錄
表 3-1節點清單案例 20

圖目錄
圖 3-1 Zookeeper運作結構 18
圖 3-2 Borg-Tree的時間分層 22
圖 3-3建立完成的Bog-Tree 23
圖 3-4逆Borg-Tree 24
圖 4-1 Zookeeper實際運作流程 31
圖 4-2 Locutus 實際運作流程 32
圖 4-3 Locutus與Zookeeper比較 33
[1] Kasim Selcuk Candan, Jong Wook Kim, Parth Nagarkar, Mithila Nagendra and
Renwei Yu, “Scalable Multimedia Data Processing in Server Clusters,” IEEE
MultiMedia, pp.3-5, 2010.
[2] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh Deborah
A.Wallach Mike Burrws, Tushar Chandra, Andrew Fikes, and Robert E.Gruber,
“ Bigtable: A Distributed Storage System for Structured Data,” 7th UENIX
Symposium on Operating Systems Design and Implementation, pp.205-218,
2006.
[3] Jeffrey Dean and Sanjay GhemawatDean, “MapReduce: Simplified Data
Processing on Large Clusters,” Communications of the ACM, Vol. 51, No. 1
pp.107-113, 2008.
[4] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File
System,” 19th ACM Symposium on Operating Systems Principles(SOSP), 2003
[5] Wei Jiang and Gagan Agrawal, “Ex-MATE Data Intensive Computing with
Large Reduction Objects and Its Application to Graph Mining,” IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing, pp.475-484,
2011.
[6] Chao Jin, Christian Vecchiola and Rajkumar Buyya, "MRPGA: An Extension of
MapReduce for Parallelizing Genetic Algorithms," IEEE Fourth International
Conference on eScience, pp.214-220, 2008.
[7] Soila Kavulya, Jiaqi Tany, Rajeev Gandhi and Priya Narasimhan, "An Analysis
of Traces from a Production MapReduce Cluster," IEEE/ACM International
Conference on Cluster, Cloud and Grid Computing, pp.94-.95, 2010.
[8] Arun Krishnan, “GridBLAST: a Globus-based high-throughput implementation
of BLAST in a Grid computing framework,”Concurrency and Computation,”
Vol.17, No. 13, pp.1607-1623, 2005.
[9] Huan Liu and Dan Orban, “Cloud MapReduce: a MapReduce Implementation on
top of a Cloud Operating System,” IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing, pp.464-474, 2011.
[10] Andréa Matsunaga, Maurício Tsugawa and José Fortes, “Combining MapReduce
and Virtualization on Distributed Resources,” IEEE Fourth International
Conference on eScience, pp.224-225, 2008.
[11] Andréa Matsunaga, Maurício Tsugawa and José Fortes, “Programming
Abstractions for Data Intensive Computing on Clouds and Grids,” IEEE Fourth
International Conference on eScience, pp.489-493, 2008.
[12] Chris Miceli, Michael Miceli, Shantenu Jha, Hartmut Kaiser, Andre Merzky,
“Programming Abstractions for Data Intensive Computing on Clouds and Grids,”
IEEE/ACM International Symposium on Cluster Computing and the Grid,
pp.480-483, 2009.
[13] Biswanath Panda, Mirek Riedewald and Daniel Fink, “The Model-Summary
Problem and a Solution for Trees,” International Conference on Data Engineering,
pp.452-455, 2010.
[14] Spiros Papadimitriou and Jimeng Sun, “Distributed Co-clustering with Map-
Reduce,” IEEE International Conference on Data Mining, pp.519, 2008.
[15] Sangwon Seo, Ingook Jang,Kyungchang Woo,Inkyo Kim, Jin-Soo Kim,
“Prefetching and Pre-shuffling in Shared MapReduce Computation
Environment,” IEEE international Conference on Cluster Computing and
Workshops, pp.1-5, 2009.
[16] Jeffrey Shafer, Scott Rixner, and Alan L. Cox, “The Hadoop distributed
filesystem: Balancing portability and performance,” IEEE International
Symposium on Performance Analysis of System and Software(ISPASS), pp.123,
2010.
[17] Heinz Stockinger, Marco Pagni, Lorenzo Cerutti and Laurent Falquet, “Grid
Approach to Embarrassingly Parallel CPU-Intensive Bioinformatics Problems,”
IEEE International Conference on e-Science and Grid Computing, 2006
[18] J. Tan, X. Pan, S. Kavulya, R. Gandhi, and P. Narasimhan, “Mochi: Visual Log-
Analysis Based Tools for Debugging Hadoop,” USENIX Workshop on Hot
Topics in Cloud Computing (HotCloud), 2009.
[19] Chao Tian, Haojie Zhou1,Yongqiang He, Li Zha1, “A Dynamic MapReduce
Scheduler for Heterogeneous Workloads,” International Conference on Grid and
Cooperative Computing, pp.221-225, 2009.
[20] Himanshu Vashishtha, Michael Smit, Eleni Stroulia, “Moving Text Analysis
Tools to the Cloud,” IEEE World Congress on Services ,pp.110-112, 2010.
[21] Abhishek Verma, Xavier Llor'a, David E. Goldberg and Roy H. Campbell,
“Scaling Genetic Algorithms using MapReduce,” International Conference on
Intelligent Systems Design and Applications, 2009.
[22] Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan,
“Detecting large-scale system problems by mining console logs,” In Processdings
of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP),
2009.
[23] Zacharia Fadika and Madhusudhan Govindaraju, “DELMA: Dynamic Elastic
MApReduce Framework for CPU-Intensive Applications,” IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing, pp.454-463,
2011.
[24] Owen O’Malley, “TeraByte Sort on Apache Hadoop”,2008
[25] Apache Software Foundation, “Hadoop,” 2007, http://hadoop.apache.org/core.
[26] Hadoop Distributed File System (HDFS) Architecture. [Online] Available:
http://hadoop.apache.org/core/docs/current/hdfs design.html
[27] HBase, http://hadoop.apache.org/hbase/.__
[28] B. Kolbeck, M. Högqvist, J. Stender, F. Hupfeld, “Flease - Lease Coordination without a Lock Server,” xtreemfs.org
[29] B. W. Lampson, “How to build a highly available system using consensus,” WDAG ’96: Proceedings of 10th International Workshop on Distributed Algorithms. London, UK: Springer-Verlag, 1996, pp. 1-17.
[30] J. Menon, D. A. Pease, R. Rees, L. Duyanovich, and B. Hillsberg, “IBM storage tank – a heterogeneous scalable SAN file system,” IBM Syst. J., vol. 42, no. 2, pp. 250-267, 2003.
[31] L. Lamport, “Paxos made simple,” SIGACT News, vol. 32, no.4, pp. 18-25, 2001.
[32] The Apache Zookeeper Project, “Apache Zookeeper website,” http://zookeeper.apache.org/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊