跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/02 12:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林佳純
研究生(外文):Lin, Jia-Chun
論文名稱:MapReduce 工作執行效能、可靠性、能源耗費與容錯之研究
論文名稱(外文):Study of Job Execution Performance, Reliability, Energy Consumption, and Fault Tolerance in the MapReduce Framework
指導教授:陳穎平呂芳懌呂芳懌引用關係
指導教授(外文):Chen, Ying-pingLeu, Fang-Yie
口試委員:竇其仁羅濟群呂芳懌陳穎平鄧德雋楊朝棟
口試委員(外文):Dow, Chyi-RenLo, Chi-ChunLeu, Fang-YieChen, Ying-pingDeng, Der-JiunnYang, Chao-Tung
口試日期:2015-03-21
學位類別:博士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:88
中文關鍵詞:MapReduceHadoop工作完成可靠性工作執行時間工作能源耗費單點錯誤容錯
外文關鍵詞:MapReduceHadoopjob completion reliabilityjob turnaround timejob energy consumptionsingle-point-of-failurefault tolerance
相關次數:
  • 被引用被引用:0
  • 點閱點閱:350
  • 評分評分:
  • 下載下載:46
  • 收藏至我的研究室書目清單書目收藏:0
在一個大型的MapReduce叢集系統運作過程中,機器可能因為許多因素而發生故障。為了避免MapReduce工作因為機器故障而被迫中止,MapReduce採用多種策略,例如:任務重新執行策略、中間結果備份策略以及reduce任務分配策略等,來防止這件事發生。然而上述策略對於MapReduce工作的影響是不清楚的,特別是對工作完成可靠性、工作執行時間以及工作能源耗費而言。在本論文中,工作完成可靠性指的是一個MapReduce工作能被一個MapReduce叢集系統執行完成的可靠性、工作執行時間指的是該工作在該叢集系統上執行所需的時間、而工作能源耗費指的是該工作在該叢集系統上執行所耗費的電力能源。為了達到一個更可靠以及更節省能源的計算環境,深入了解上述影響是必需的。此外,MapReduce主伺服器具有單點故障的問題。當它發生故障時,整個MapReduce叢集系統的運作以及服務會因而中斷。為了探討上述不同MapReduce策略對於MapReduce工作之效能影響,在本論文中,我們深入地分析MapReduce工作在不同MapReduce策略之下的工作完成可靠性、工作執行時間以及工作能源耗費等表現。另外,為了解決單點故障問題,本論文也提出一個前攝性與自適應冗餘系統 (Proactive and Adaptive Redundant System,縮寫為PAReS) 來減輕MapReduce主伺服器之單點故障問題,並且同時改善其服務品質。我們的分析結果能夠幫助MapReduce管理者深入了解這些策略所造成的影響、協助MapReduce管理者針對其MapReduce叢集系統選擇適當的MapReduce策略以提升MapReduce工作之效能,以及有助於MapReduce架構設計人員設計出更適合MapReduce之策略。此外,根據我們的實驗結果指出,本論文提出的PAReS能夠有效改善MapReduce主伺服器之單點故障問題,以及大幅提升其服務品質。
Node/machine failure is the norm rather than an exception in a large-scale MapReduce cluster. To prevent jobs from being interrupted by machine/node failures, MapReduce has employed several policies, such as task-reexecution policy, intermediate-data replication policy, reduce-task assignment policy. However, the impacts of these policies on MapReduce jobs are not clear, especially in terms of Job Completion Reliability (JCR for short), Job Turnaround Time (JTT for short), and Job Energy Consumption (JEC for short). In this dissertation, JCR is the reliability with which a MapReduce job can be completed by a MapReduce cluster, JTT is the time period starting when the job is submitted to the cluster and ending when the job is completed by the cluster, and JEC is the energy consumed by the cluster to complete the job. To achieve a more reliable and energy-efficient computing environment than current MapReduce infrastructure, it is essential to comprehend the impacts of the above policies. In addition, the MapReduce master servers suffer from a single-point-of-failure problem, which might interrupt MapReduce operations and filesystem services. To study how the above polices influence the performances of MapReduce jobs, in this dissertation, we formally derive and analyze the JCR, JTT, and JEC of a MapReduce job under the abovementioned MapReduce policies. In addition, to mitigate the single-point-of-failure problem and improve the service qualities of MapReduce master servers, we propose a hybrid takeover scheme called PAReS (Proactive and Adaptive Redundant System) for MapReduce master servers. The analyses in this dissertation enable MapReduce managers to comprehend the influences of these policies on MapReduce jobs, help MapReduce managers to choose appropriate MapReduce policies for their MapReduce clusters, and allow MapReduce designers to propose better policies for MapReduce. Furthermore, based on our extensive experimental results, the proposed PAReS system can mitigate the single-point-of-failure problem and improve the service qualities of MapReduce master servers as compared with current redundant schemes on Hadoop.
摘 要 i
ABSTRACT ii
Table of Contents iv
List of Figures vi
List of Tables viii
1. Introduction 1
2. Background and Related Work 5
2.1 Background of MapReduce 5
2.2 Related Work 7
2.2.1 Performance analyses and evaluation for MapReduce 8
2.2.2 Redundant mechanisms for MapReduce master servers 9
3. Impacts of the TR Policy 11
3.1 The Universal Generating Function 11
3.2 Job Completion Reliability 14
3.3 Job Turnaround Time and Job Energy Consumption 17
3.3.1 The u-function for the map tasks of J 17
3.3.2 The u-function for the reduce tasks of J 19
3.3.3 The expected JTT and JEC of J 20
3.4 Experimental and Analytical Results 21
3.4.1 Scenario 1: MapReduce jobs with different filtering percentages 23
3.4.2 Scenario 2: MapReduce jobs with different input data sizes 27
3.4.3 Scenario 3: MapReduce jobs with different numbers of reduce tasks 30
3.5 Summary 33
4. Impacts of Intermediate-data Replication Policies and Reduce-task Assignment Policies 34
4.1 MapReduce Policies 35
4.1.1 The locally-stored policy 35
4.1.2 The distributively-stored policy 35
4.1.3 The non-delay assignment policy 36
4.1.4 The delay assignment policy 36
4.2 Job Completion Reliability 36
4.2.1 JCR on POC-1 38
4.2.2 JCR on POC-2 42
4.2.3 JCR on POC-3 45
4.2.4 JCR on POC-4 46
4.3 Job Energy Consumption 48
4.3.1 JEC on POC-1 48
4.3.2 JEC on POC-2 50
4.3.3 JEC on POC-3 50
4.3.4 JEC on POC-4 51
4.4 Performance Evaluation and Comparison 51
4.4.1 Scenario 1: jobs having only one reduce task in the P-case 52
4.4.2 Scenario 2: jobs having only one reduce task in the S-case 55
4.4.3 Scenario 3: jobs having only multiple reduce tasks in the P-case 57
4.4.4 Scenario 4: jobs having only multiple reduce tasks in the S-case 59
4.5 Summary 61
5. Mitigating the Single-Point-of-failure Problem 63
5.1 The Proposed PAReS Scheme 64
5.1.1 State-update operations 64
5.1.2 The proactive synchronization and replication method (PSR method) 65
5.1.3 The mutual monitoring algorithm (MM algorithm) 67
5.1.4 The adaptive warm-up mechanism 68
5.1.5 Takeover process in PAReS 70
5.2 Comparison of Energy Consumption and Synchronization Cost 71
5.3 Comparison of Service Qualities 73
5.3.1 Service downtime 74
5.3.2 Interrupted and dropped requests affected by the master server’s failure 77
5.3.3 The takeover performance of WNode 79
5.4 Summary 80
6. Conclusions and Future Work 81
6.1 Conclusions 81
6.2 Future Work 82
Bibliography 84

[1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communication of the ACM, Vol. 51, Issue 1, pp. 107–113, 2008.
[2] J. Dean and S. Ghemawat, “MapReduce: a Flexible Data Processing Tool,” Communication of the ACM, Vol. 53, Issue 1, pp. 72–77, 2010.
[3] Hadoop. http://hadoop.apache.org. (August 6, 2014)
[4] S. Chen and S. Schlosser, “Map-Reduce Meets Wider Varieties of Applications,” Intel Research Pittsburgh, Technical Report IRP-TR-08-05, 2008.
[5] B. White, T. Yeh, J. Lin, and L. Davis, “Web-Scale Computer Vision using MapReduce for Multimedia Data Mining,” in Proc. of the International Workshop on Multimedia Data Mining, pp.1–10, 2010.
[6] X-RIME. http://xrime.sourceforge.net. (August 6, 2014)
[7] J. Shi, W. Xue, W. Wang, Y. Zhang, B. Yang, and J. Li, “Scalable Community Detection in Massive Social Networks using MapReduce,” IBM Journal of Research and Development, Vol. 57, Issue: 3/4, pp.12:1–12:14, 2013.
[8] A. Matsunaga, M. Tsugawa, and J. Fortes, “CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications,” in the IEEE International Conference on e-Science, pp. 222–229, 2008.
[9] K. Wiley, A. Connolly, J. P. Gardner, S. Krughof, M. Balazinska, B. Howe, Y. Kwon, and Y. Bu, “Astronomy in the Cloud: Using Mapreduce for Image Co-addition,” Astronomy, Vol. 123, No. 901, pp. 366–380, 2011.
[10] W. Lu, J. Huang, and L. Hong, “Massive Data MapReduce Fingerprint Discriminant Algorithm Based on Hadoop,” Applied Mechanics and Materials, Vol. 263, pp. 2655–2660, 2013.
[11] L.A. Barroso and U. Hölzle, “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines,” Synthesis Lectures on Computer Architecture, 4(1), pp. 1–108, 2009.
[12] B. Schroeder and G. Gibson, “A Large-Scale Study of Failures in High-Performance Computing Systems,” IEEE Transactions on Dependable and Secure Computing, Vol. 7, Issue 4, pp. 337–350, 2010.
[13] K. V. Vishwanath and N. Nagappan, “Characterizing Cloud Computing Hardware Reliability,” in Proc. of the 1st ACM symposium on Cloud computing, pp. 193–204, 2010.
[14] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig Latin: A Not-so-foreign Language for Data Processing,” in Proc. of the ACM SIGMOD international conference on Management of data, pp. 1099–1110, 2008.
[15] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,” in Proc. of ACM SIGOPS/EuroSys European Conference on Computer Systems, pp. 59–72, 2007.
[16] S. Y. Ko, I. Hoque, B. Cho, and I. Gupta, “Making Cloud Intermediate Data Fault-Tolerant,” in Proc. of the ACM symposium on Cloud computing, pp. 181–192, 2010.
[17] D. Moise, T.T.L. Trieu, L. Bouge, and G. Antoniu, “Optimizing Intermediate Data Management in MapReduce Computations,” in Proc. of the First International Workshop on Cloud Computing Platforms, pp. 5, 2011.
[18] T. White, “Hadoop: The definitive guide,” O'Reilly Media, Yahoo! Press, June 5, 2009.
[19] G. Levitin (2006). The universal generating function in reliability analysis and optimization. Springer.
[20] F. Wang, J. Qiu, J. Yang, B. Dong, X. Li, and Y. Li, “Hadoop High Availability through Metadata Replication,” in Proceedings of the first international workshop on Cloud data management, ACM, pp. 37–44, 2009.
[21] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” in Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies, pp.1–10, 2010.
[22] F. Dinu and T. S. Ng, “Understanding the Effects and Implications of Compute Node Related Failures in Hadoop,” in Proc. of the international symposium on High-Performance Parallel and Distributed Computing, pp. 187–198, 2012.
[23] G. Wang, A. R. Butt, P. Pandey, and K. Gupta, “A Simulation Approach to Evaluating Design Decisions in MapReduce Setups,” in IEEE Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 1–11, 2009.
[24] Hadoop’s implementation of the Terasort bench-mark, http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html. (Last accessed July 9, 2014)
[25] Q. Zheng, “Improving MapReduce Fault Tolerance in the Cloud,” in IEEE Symposium on Parallel &; Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1–6, 2010.
[26] H. Jin, K. Qiao, X.-H. Sun, and Y. Li, “Performance under Failures of MapReduce Applications,” in IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 608–609, 2011.
[27] E. Okorafor and M. K. Patrick, “Availability of JobTracker Machine in Hadoop/MapReduce Zookeeper Coordinated Clusters,” Advanced Computing: An International Journal (ACIJ), Vol.3, No.3, pp. 19–30, May 2012.
[28] W. Lang and J. M. Patel, “Energy Management for MapReduce Clusters,” in Proc. of the VLDB Endowment, Vol. 3, Issue 1-2, pp.129–139, 2010.
[29] C. Liu, X. Qin, S. Kulkarni, C. Wang, S. Li, A. Manzanares, and S. Baskiyar, “Distributed Energy-efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids,” in IEEE International Conference on Performance, Computing and Communications Conference, pp. 26 –33, 2008.
[30] B. Feng, J. Lu, Y. Zhou, and N. Yang, “Energy Efficiency for MapReduce Workloads: An In-depth Study,” in Proc. of the Twenty-Third Australasian Database Conference (ADC), Melbourne, Australia, pp. 61–69, 2012.
[31] P. Alvaro, T. Condie, N. Conway, K. Elmeleegy, J. M. Hellerstein, and R. C. Sears, “BOOM: Data-Centric Programming in the Datacenter,” Technical Report UCB/EECS-2009-113, EECS Department, University of California, Berkeley, Jul 2009.
[32] F.-Y. Leu, C.-T. Yang, and F.-C. Jiang, “Improving Reliability of a Heterogeneous Grid-based Intrusion Detection Platform using Levels of Redundancies,” Future Generation Computer Systems, Vol. 26, Issue 4, pp.554–568, 2010.
[33] Y. Du and H. Yu, “Paratus: Instantaneous Failover via Virtual Machine Replication,” in Proceedings of 8th International Conference on Grid and Cooperative Computing, pp.307–312, 2009.
[34] J. Wan, M. Liu, X. Hu, Z. Ren, J. Zhang, W. Shi, and W. Wu, “Dual-JT: Toward the High Availability of JobTracker in Hadoop,” in IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 263–268, 2012.
[35] I. A. Ushakov “Optimal Standby Problems and a Universal Generating Function,” Soviet journal of computer and systems sciences, Vol. 25, Issue 4, pp. 79–82, 1987.
[36] G. Levitin and Y. S. Dai, “Service Reliability and Performance in Grid System with Star Topology,” Reliability Engineering &; System Safety, Vol. 92, Issue 1, pp. 40–46, 2007.
[37] F. A. Haight. Handbook of the Poisson Distribution. New York: John Wiley &; Sons, 1967.
[38] G. Levitin, A. Lisnianski, H. Ben-Haim &; D. Elmakis, “Redundancy Optimization for Series-parallel Multi-state Systems,” IEEE Transactions on Reliability, Vol. 47, Issue 2, pp. 165–172, 1988.
[39] F. Ahmad, S. Lee, M. Thottethodi and T. N. Vijaykumar, “PUMA: Purdue MapReduce Benchmarks Suite,” ECE Technical Reports. http://docs.lib.purdue.edu/ecetr/437 (Last accessed July 2, 2014)
[40] MapReduce Benchmarks, https://sites.google.com/site/farazahmad/pumabenchmarks (Last accessed July 2, 2014.
[41] J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguad´e, M. Steinder, and I. Whaley, “Performance-Driven Task Co-Scheduling for MapReduce Environments,” in Proc. of the IEEE/IFIP Network Operations and Management Symposium, pp. 373–380, 2010.
[42] W.K. Lai, Y.-U. Chen, T.-Y. Wu, and M.S. Obaidat, “Towards a framework for large-scale multimedia data storage and processing on Hadoop platform,” The Journal of Supercomputing, Vol. 68, Issue 1, pp. 488–507, 2014.
[43] P. Hunt, M. Konar, F. Junqueira, and B. Reed, “Zookeeper: Wait-free Coordination for Internet-scale Systems,” in Proceedings of the USENIX Annual Technical Conference, pp. 145–158, 2010.
[44] P. Alsberg and J. Day, “A Principle for Resilient Sharing of Distributed Resources,” in Proceedings of the 2nd Internal Conference on Software Engineering, pp. 627–644, 1976.
[45] N. Budhiraja, K. Marzullo, F. Schneider, and S. Toueg, “The Primary-Backup Approach,” in Distributed systems 2, pp. 199–216, 1993.
[46] R. van Renesse and F. B. Schneider, “Chain Replication for Supporting High Throughput and Availability,” in Sixth Symposium on Operating Systems Design and Implementation, pp. 91–104, 2004.
[47] P. Felber, X. D´efago, R. Guerraoui, and P. Oser, “Failure Detectors as First Class Objects,” in Proceedings of the 9th IEEE International Symposium on Distributed Objects and Applications, pp. 132–141, 1999.
[48] Gratuitous ARP, http://wiki.wireshark.org/Gratuitous_ARP. Accessed 20 April 2013.
[49] L. Kleinrock, Queueing Systems, Volume 1: Theory, New York: John Wiley, 1975.
[50] I. Arnaldo, K. Veeramachaneni, A. Song, and U. O'Reilly, “Bring Your Own Learner: A Cloud-Based, Data-Parallel Commons for Machine Learning,” in IEEE Computational Intelligence Magazine, 10(1), pp. 20–32, 2015.
[51] B. Wang, S. Huang, J. Qiu, Y. Liu, and G. Wang, “Parallel Online Sequential Extreme Learning Machine based on MapReduce,” Neurocomputing, Vol. 149, pp. 224–232, 2015.
[52] B. Lu and S. Wei, “One More Efficient Parallel Initialization Algorithm of K-Means with MapReduce,” in Proceedings of the 4th International Conference on Computer Engineering and Networks, pp. 845–852, 2015, Springer.
[53] G. Xu, W. Yu, Z. Chen, H. Zhang, P. Moulema, X. Fu and C. Lu, “A cloud computing based system for cyber security management,” International Journal of Parallel, Emergent and Distributed Systems, Vol. 30, Issue 1, pp. 29–45, 2015.
[54] F. Liu, X. Shu, D. Yao, and A. R. Butt, “Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduce,” in Proceedings of ACM CODASPY, 2015.
[55] S. S. Vernekar and A. Buchade, “Threat Analysis and Identification Using Map Reduce Hadoop Platform,” International Journal for Scientific Research and Development, Vol. 1, Issue 7, pp. 190–197, 2015.
[56] N. K. Dengle and S. C. Dharmadhikari, “A Comparative Study of Malware Detection System from Hadoop Perspective,” Software Engineering and Technology, Vol. 7, Issue 1, pp. 1–3, 2015.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊