跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.41) 您好!臺灣時間:2026/01/13 15:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃心怡
研究生(外文):Huang, Hsin-Yi
論文名稱:實現雲端運算 Hadoop MapReduce 之分級服務
論文名稱(外文):Realizing Prioritized MapReduce Service in Hadoop Distributed File System
指導教授:葉佐任
指導教授(外文):Yeh, Tso-Zen
口試委員:葉佐任黃文吉白英文
口試委員(外文):Yeh, Tso-ZenHuang, Wen-ChiBai, Ying-Wen
口試日期:2016-01-26
學位類別:碩士
校院名稱:輔仁大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:76
中文關鍵詞:優先權硬碟排程雲端運算巨量資料
外文關鍵詞:HadoopYARN
相關次數:
  • 被引用被引用:1
  • 點閱點閱:304
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:1
Hadoop是一個能夠廣泛應用及具有高延展性的平台軟體,也是一個能夠處理大量資料和擁有高容錯性的分散式系統。如同其它應用軟體一樣Hadoop必須建置在作業系統上執行,並且透過作業系統才能和硬體互相通訊協調。隨著雲端運算(Cloud Computing)和巨量資料(Big Data)的出現,支援雲端服務軟體執行的雲端平台也就變得非常重要。

Hadoop針對執行的工作有提供資源分配的機制。在此機制下提交的工作群被分配成不同等級的資源分配順序,具有高資源分配的工作較之低資源分配者可以有較高的機會獲得資源,並且優先執行工作。然而在相同等級的工作群中並無法指定特定工作優先執行。因此在忙碌的Hadoop環境中,即使是具有高資源分配的工作,由於同等級排隊等待資源的工作很多,也無法保證其能較快速得到資源以提早完成工作。本研究將提出讓特定使用者在Hadoop環境下可自行設定優先執行工作的機制,並且在作業系統中加入之前已改善的CFQ硬碟排程器的優先權機制,以及在記憶體置換機制中加入優先權的研究,使得具有優先權的程式在Hadoop系統中執行時,可以優先獲得資源,並且在作業系統中可以增加運算與I/O執行的速度,提升程式執行的效率。

在本研究的實驗裡,我們在Hadoop系統中同時執行多個應用程式以模擬一個忙碌的環境,並且在多個程式中設定特定的程式具有優先權,經過實驗的結果比較,當同一個程式具有優先權時會比沒有優先權的情況在執行效率上,最高可以縮短約80%的執行時間,最低也可以縮短約30%的執行時間。

Hadoop is a widely used and highly scalable platform software, and it is a distributed system which can handle a large amount of data with a high fault-tolerance feature. Like other application software, Hadoop system must build on the operating system, and must communicate and coordinate with hardware through the operating system.
As Cloud Computing and Big Data appear, the cloud software platform becomes very important to support the cloud services implementation.

Hadoop has a mechanism for the work performed by the allocation of resources. The work groups submitted under this mechanism are assigned to different levels of resource allocation sequence, and the work with high allocation of resources may have more chances of getting resources and higher priority to implement than those with low allocation of resources. However, you can not enable a particular job with higher precedence over other jobs in the same level of resource allocation group. When Hadoop is busy, a lot of works with the same level of resource allocation wait in line. Even for the work with high resource allocation, there is no guarantee that it can quickly get more resources to complete the work earlier. The research presents a Hadoop environment that users can set different priority levels to different jobs by adding priority mechanism to disk CFQ scheduler and to memory replacement. As a result, the execution of program with high priority can be accelerated accordingly.

In the experiments, we performed multiple simultaneous Hadoop system applications to simulate a busy environment, and set specific programs with high priority to see how faster they can execute than their execution with normal priority. Our results show that for programs with high priority, their execution time can be
reduced by a range between 30% and 80%.

1 導論
1.1 研究動機
1.2 研究目的
1.3 論文架構
2 背景知識與相關研究
2.1 Hadoop分散式系統架構
2.2 Hadoop YARN設計與運作機制
2.2.1 Hadoop YARN元件間的通訊
2.2.2 Hadoop YARN整體運作流程
2.3 Hadoop資源排程機制
2.3.1 Capacity scheduler 資源分配機制
2.3.2 Fair scheduler資源分配機制
2.3.3 FIFO scheduler資源分配機制
2.4 MapReduce運作機制
2.5 優先權概要
2.5.1 CPU排程優先權機制
2.5.2 硬碟讀寫優先權機制
2.5.3 記憶體優先權機制
2.5.4 Hadoop優先權機制
3 研究設計與實作
3.1 設計架構
3.2 Hadoop YARN設定優先權
3.2.1 YARN客戶端設計與優先權傳遞
3.2.2 ResourceManager元件設計與優先權傳遞
3.2.3 NodeManager元件設計與優先權傳遞
3.3 MapReduce 設計與優先權傳遞
3.4 Hadoop HDFS 設定I/O優先權與優先權傳遞
3.5 Hadoop與作業系統合作提升優先權
3.5.1 磁碟I/O優先權
3.5.2 記憶體優先權
4 實驗與分析
4.1 實驗設計
4.2 實驗的硬體設備
4.3 實驗一:公平排程器中設定一個佇列
4.3.1 公平排程器中提升一個程式wordCount優先權
4.3.2 公平排程器中提升二個程式wordCount、Grep優先權
4.4 實驗二:公平排程器設定三個相同資源配置的佇列
4.4.1 公平排程器中提升一個程式wordCount優先權
4.4.2 公平排程器中提升二個程式wordCount、Grep優先權
4.5 實驗三:公平排程器設定一個高資源配置與兩個低資源配置佇列
4.5.1 公平排程器中提升一個程式wordCount優先權
4.5.2 公平排程器中提升二個程式wordCount、Grep優先權
4.6 實驗四:容量排程器設定三個相同資源配置的佇列
4.6.1 容量排程器中提升一個程式wordCount優先權
4.6.2 容量排程器中提升二個程式wordCount、Grep優先權
4.7 實驗五:容量排程器設定一個高資源配置與兩個低資源配置佇列
4.7.1 容量排程器中提升一個程式wordCount優先權
4.7.2 容量排程器中提升二個程式wordCount、Grep優先權
5 結論與未來研究
參考文獻

[1] Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung, "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.
[2] Dean, Jeffrey, and Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1, 107-113, 2008.
[3] Sammer and Eric, "Hadoop operations." O'Reilly Media, Inc., 2012.
[4] Grover, Mark, et al., "Hadoop Application Architectures." O'Reilly Media, Inc., 2015.
[5] Thusoo, Ashish, et al., "Hive: a warehousing solution over a mapreduce framework." Proceedings of the VLDB Endowment 2.2, 1626-1629, 2009.
[6] HBase - Apache Software Foundation project home page, http://hadoop.apache.org/hbase/, 2009.
[7] Yang, Wenjie, et al., "Big Data Real-Time Processing Based on Storm." Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on. IEEE, 2013.
[8] Zaharia, Matei, et al., "Spark: Cluster Computing with Working Sets." HotCloud, 2010.
[9] Vavilapalli, Vinod Kumar, et al., "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
[10] Hadoop Saves Children’s Hospital from Information Chaos, https://www.experfy.com/blog/hadoop-saves-childrens-hospital-information-chaos.
[11] Hadoop is Transforming Telecommunications, http://hortonworks.com/industry/telecom/.
[12] http://www.ithome.com.tw/node/68023, 2011.
[13] http://www.cna.com.tw/news/FirstNews/201003170032-1.aspx, 2010.
[14] Hadoop - Apache Software Foundation project home page, https://hadoop.apache.org/.
[15] An Introduction to HDFS Federation, http://hortonworks.com/blog/an-introduction-to-hdfs-federation/.
[16] Birrell, Andrew D., and Bruce Jay Nelson, "Implementing remote procedure calls." ACM Transactions on Computer Systems (TOCS) 2.1, 39-59, 1984.
[17] Google. Protocol Buffers: Google's Data Interchange Format. Documentation and open source release, http://code.google.com/p/protobuf/.
[18] Hadoop's Capacity Scheduler, http://hadoop.apache.org/core/docs/current/capacity_scheduler.html.
[19] Isard, Michael, et al., "Quincy: fair scheduling for distributed computing clusters." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009.
[20] Zaharia, Matei, et al., "Job scheduling for multi-user mapreduce clusters." EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-55, 2009.
[21] Ghodsi, Ali, et al., "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types." NSDI. Vol. 11. 2011.
[22] Popovici, Florentina I., Andrea C. Arpaci-Dusseau, and Remzi H.Arpaci-Dusseau, "Robust, Portable I/O Scheduling with the Disk Mimic." USENIX Annual Technical Conference, General Track. 2003.
[23] L. Yang, "Developing a multi-level priority disk scheduler," Master's thesis, Fu Jen Catholic University, July 2007.
[24] Tsozen Yeh and Shuwen Yang, "Improving the program performance through prioritized disk operation." High Performance Computing and Simulation (HPCS), 2012 International Conference on. IEEE, 2012.
[25] Tsozen Yeh, Shuwen Yang, and Yifeng Sun, "Improving the program performance through prioritized memory management and disk operation." Concurrency and Computation: Practice and Experience 27.13, 3345-3361, 2015.
[26] Sandholm, Thomas, and Kevin Lai, "Dynamic proportional share scheduling in hadoop." Job scheduling strategies for parallel processing. Springer Berlin Heidelberg, 2010.
[27] Blagojevic, Filip, et al., "Priority IO Scheduling in the Cloud." HotCloud, 2013.
[28] Tsozen Yeh and Yifeng Sun, "Enabling Prioritized Cloud I/O Service in Hadoop Distributed File System." High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC, CSS, ICESS), 2014 IEEE Intl Conf on. IEEE, 2014.
[29] HDFS Short-Circuit Local Reads, https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html.
[30] HDFS-2246, https://issues.apache.org/jira/browse/HDFS-2246, 2011.
[31] HDFS-347, https://issues.apache.org/jira/browse/HDFS-347, 2008.
[32] https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt, 2007.
[33] http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_yarn_resource_mgt/content/ch_cgroups.html.
[34] Kavis, Michael J, Architecting the cloud: design decisions for cloud computing service models (SaaS, PaaS, and IaaS). John Wiley & Sons, 2014.
[35] Tom, "Hadoop: The Definitive Guide 4/e. " O'Reilly Media, Inc., 2015.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊