臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.171) 您好！臺灣時間：2026/04/09 09:58

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

黃心怡

研究生(外文):

Huang, Hsin-Yi

論文名稱:

實現雲端運算 Hadoop MapReduce 之分級服務

論文名稱(外文):

Realizing Prioritized MapReduce Service in Hadoop Distributed File System

指導教授:

葉佐任

指導教授(外文):

Yeh, Tso-Zen

口試委員:

葉佐任、黃文吉、白英文

口試委員(外文):

Yeh, Tso-Zen、Huang, Wen-Chi、Bai, Ying-Wen

口試日期:

2016-01-26

學位類別:

碩士

校院名稱:

輔仁大學

系所名稱:

資訊工程學系碩士班

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2016

畢業學年度:

104

語文別:

中文

論文頁數:

中文關鍵詞:

優先權、硬碟排程、雲端運算、巨量資料

外文關鍵詞:

Hadoop、YARN

相關次數:

被引用:1
點閱:312
評分:
下載:2
書目收藏:1

Hadoop是一個能夠廣泛應用及具有高延展性的平台軟體，也是一個能夠處理大量資料和擁有高容錯性的分散式系統。如同其它應用軟體一樣Hadoop必須建置在作業系統上執行，並且透過作業系統才能和硬體互相通訊協調。隨著雲端運算（Cloud Computing）和巨量資料（Big Data）的出現，支援雲端服務軟體執行的雲端平台也就變得非常重要。

Hadoop針對執行的工作有提供資源分配的機制。在此機制下提交的工作群被分配成不同等級的資源分配順序，具有高資源分配的工作較之低資源分配者可以有較高的機會獲得資源，並且優先執行工作。然而在相同等級的工作群中並無法指定特定工作優先執行。因此在忙碌的Hadoop環境中，即使是具有高資源分配的工作，由於同等級排隊等待資源的工作很多，也無法保證其能較快速得到資源以提早完成工作。本研究將提出讓特定使用者在Hadoop環境下可自行設定優先執行工作的機制，並且在作業系統中加入之前已改善的CFQ硬碟排程器的優先權機制，以及在記憶體置換機制中加入優先權的研究，使得具有優先權的程式在Hadoop系統中執行時，可以優先獲得資源，並且在作業系統中可以增加運算與I/O執行的速度，提升程式執行的效率。

在本研究的實驗裡，我們在Hadoop系統中同時執行多個應用程式以模擬一個忙碌的環境，並且在多個程式中設定特定的程式具有優先權，經過實驗的結果比較，當同一個程式具有優先權時會比沒有優先權的情況在執行效率上，最高可以縮短約80%的執行時間，最低也可以縮短約30%的執行時間。

Hadoop is a widely used and highly scalable platform software, and it is a distributed system which can handle a large amount of data with a high fault-tolerance feature. Like other application software, Hadoop system must build on the operating system, and must communicate and coordinate with hardware through the operating system.
As Cloud Computing and Big Data appear, the cloud software platform becomes very important to support the cloud services implementation.

Hadoop has a mechanism for the work performed by the allocation of resources. The work groups submitted under this mechanism are assigned to different levels of resource allocation sequence, and the work with high allocation of resources may have more chances of getting resources and higher priority to implement than those with low allocation of resources. However, you can not enable a particular job with higher precedence over other jobs in the same level of resource allocation group. When Hadoop is busy, a lot of works with the same level of resource allocation wait in line. Even for the work with high resource allocation, there is no guarantee that it can quickly get more resources to complete the work earlier. The research presents a Hadoop environment that users can set different priority levels to different jobs by adding priority mechanism to disk CFQ scheduler and to memory replacement. As a result, the execution of program with high priority can be accelerated accordingly.

In the experiments, we performed multiple simultaneous Hadoop system applications to simulate a busy environment, and set specific programs with high priority to see how faster they can execute than their execution with normal priority. Our results show that for programs with high priority, their execution time can be
reduced by a range between 30% and 80%.

1 導論
1.1 研究動機
1.2 研究目的
1.3 論文架構
2 背景知識與相關研究
2.1 Hadoop分散式系統架構
2.2 Hadoop YARN設計與運作機制
2.2.1 Hadoop YARN元件間的通訊
2.2.2 Hadoop YARN整體運作流程
2.3 Hadoop資源排程機制
2.3.1 Capacity scheduler 資源分配機制
2.3.2 Fair scheduler資源分配機制
2.3.3 FIFO scheduler資源分配機制
2.4 MapReduce運作機制
2.5 優先權概要
2.5.1 CPU排程優先權機制
2.5.2 硬碟讀寫優先權機制
2.5.3 記憶體優先權機制
2.5.4 Hadoop優先權機制
3 研究設計與實作
3.1 設計架構
3.2 Hadoop YARN設定優先權
3.2.1 YARN客戶端設計與優先權傳遞
3.2.2 ResourceManager元件設計與優先權傳遞
3.2.3 NodeManager元件設計與優先權傳遞
3.3 MapReduce 設計與優先權傳遞
3.4 Hadoop HDFS 設定I/O優先權與優先權傳遞
3.5 Hadoop與作業系統合作提升優先權
3.5.1 磁碟I/O優先權
3.5.2 記憶體優先權
4 實驗與分析
4.1 實驗設計
4.2 實驗的硬體設備
4.3 實驗一：公平排程器中設定一個佇列
4.3.1 公平排程器中提升一個程式wordCount優先權
4.3.2 公平排程器中提升二個程式wordCount、Grep優先權
4.4 實驗二：公平排程器設定三個相同資源配置的佇列
4.4.1 公平排程器中提升一個程式wordCount優先權
4.4.2 公平排程器中提升二個程式wordCount、Grep優先權
4.5 實驗三：公平排程器設定一個高資源配置與兩個低資源配置佇列
4.5.1 公平排程器中提升一個程式wordCount優先權
4.5.2 公平排程器中提升二個程式wordCount、Grep優先權
4.6 實驗四：容量排程器設定三個相同資源配置的佇列
4.6.1 容量排程器中提升一個程式wordCount優先權
4.6.2 容量排程器中提升二個程式wordCount、Grep優先權
4.7 實驗五：容量排程器設定一個高資源配置與兩個低資源配置佇列
4.7.1 容量排程器中提升一個程式wordCount優先權
4.7.2 容量排程器中提升二個程式wordCount、Grep優先權
5 結論與未來研究
參考文獻

[1] Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung, "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.
[2] Dean, Jeffrey, and Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1, 107-113, 2008.
[3] Sammer and Eric, "Hadoop operations." O'Reilly Media, Inc., 2012.
[4] Grover, Mark, et al., "Hadoop Application Architectures." O'Reilly Media, Inc., 2015.
[5] Thusoo, Ashish, et al., "Hive: a warehousing solution over a mapreduce framework." Proceedings of the VLDB Endowment 2.2, 1626-1629, 2009.
[6] HBase - Apache Software Foundation project home page, http://hadoop.apache.org/hbase/, 2009.
[7] Yang, Wenjie, et al., "Big Data Real-Time Processing Based on Storm." Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on. IEEE, 2013.
[8] Zaharia, Matei, et al., "Spark: Cluster Computing with Working Sets." HotCloud, 2010.
[9] Vavilapalli, Vinod Kumar, et al., "Apache hadoop yarn: Yet another resource negotiator." Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 2013.
[10] Hadoop Saves Children’s Hospital from Information Chaos, https://www.experfy.com/blog/hadoop-saves-childrens-hospital-information-chaos.
[11] Hadoop is Transforming Telecommunications, http://hortonworks.com/industry/telecom/.
[12] http://www.ithome.com.tw/node/68023, 2011.
[13] http://www.cna.com.tw/news/FirstNews/201003170032-1.aspx, 2010.
[14] Hadoop - Apache Software Foundation project home page, https://hadoop.apache.org/.
[15] An Introduction to HDFS Federation, http://hortonworks.com/blog/an-introduction-to-hdfs-federation/.
[16] Birrell, Andrew D., and Bruce Jay Nelson, "Implementing remote procedure calls." ACM Transactions on Computer Systems (TOCS) 2.1, 39-59, 1984.
[17] Google. Protocol Buffers: Google's Data Interchange Format. Documentation and open source release, http://code.google.com/p/protobuf/.
[18] Hadoop's Capacity Scheduler, http://hadoop.apache.org/core/docs/current/capacity_scheduler.html.
[19] Isard, Michael, et al., "Quincy: fair scheduling for distributed computing clusters." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009.
[20] Zaharia, Matei, et al., "Job scheduling for multi-user mapreduce clusters." EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-55, 2009.
[21] Ghodsi, Ali, et al., "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types." NSDI. Vol. 11. 2011.
[22] Popovici, Florentina I., Andrea C. Arpaci-Dusseau, and Remzi H.Arpaci-Dusseau, "Robust, Portable I/O Scheduling with the Disk Mimic." USENIX Annual Technical Conference, General Track. 2003.
[23] L. Yang, "Developing a multi-level priority disk scheduler," Master's thesis, Fu Jen Catholic University, July 2007.
[24] Tsozen Yeh and Shuwen Yang, "Improving the program performance through prioritized disk operation." High Performance Computing and Simulation (HPCS), 2012 International Conference on. IEEE, 2012.
[25] Tsozen Yeh, Shuwen Yang, and Yifeng Sun, "Improving the program performance through prioritized memory management and disk operation." Concurrency and Computation: Practice and Experience 27.13, 3345-3361, 2015.
[26] Sandholm, Thomas, and Kevin Lai, "Dynamic proportional share scheduling in hadoop." Job scheduling strategies for parallel processing. Springer Berlin Heidelberg, 2010.
[27] Blagojevic, Filip, et al., "Priority IO Scheduling in the Cloud." HotCloud, 2013.
[28] Tsozen Yeh and Yifeng Sun, "Enabling Prioritized Cloud I/O Service in Hadoop Distributed File System." High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC, CSS, ICESS), 2014 IEEE Intl Conf on. IEEE, 2014.
[29] HDFS Short-Circuit Local Reads, https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html.
[30] HDFS-2246, https://issues.apache.org/jira/browse/HDFS-2246, 2011.
[31] HDFS-347, https://issues.apache.org/jira/browse/HDFS-347, 2008.
[32] https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt, 2007.
[33] http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_yarn_resource_mgt/content/ch_cgroups.html.
[34] Kavis, Michael J, Architecting the cloud: design decisions for cloud computing service models (SaaS, PaaS, and IaaS). John Wiley & Sons, 2014.
[35] Tom, "Hadoop: The Definitive Guide 4/e. " O'Reilly Media, Inc., 2015.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於Hadoop架構之網站日誌分析系統
2.	臺灣雲端運算暨儲存系統產業發展策略研究
3.	巨量資料生態雲端策略集群分析-以財務績效指標探討
4.	以Hadoop平台為框架加速電子病歷交換之系統設計
5.	巨量資料在犯罪預防與犯罪偵查之應用
6.	雲端城市交通狀態評估系統應用巨量資料的架構
7.	智慧型室內環境監控感測資料存取雲端服務於HBase 資料庫上實作
8.	基於分散式計算之企業網路遠端存取記錄分析模型設計與建置
9.	運用雲端運算於巨量資料之頻繁項目集探勘
10.	Hadoop之專利佈局與技術發展預測之研究
11.	Hadoop 於異質平台之資源管理系統
12.	雲端運算之XML巨量資料處理機制設計
13.	應用高階模糊派翠網路之巨量資料分析平台
14.	資料串流計算之記憶體排程器
15.	平行運算架構下之巨量資料探勘：分散式與雲端方法之比較

無相關期刊

1.	實現雲端運算Hadoop叢集儲存資料之差異分析
2.	實現Hadoop叢集HDFS檔案之歷史版本保存
3.	以Hadoop MapReduce叢集架構設計改良式平行化分群演算法
4.	參考狀態指標之Hadoop分散計算效能分析與評估
5.	針對大數據分析之建構於可程式邏輯板Hadoop系統設計
6.	實現雲端運算 Hadoop HDFS 磁碟及記憶體之即時分級服務
7.	在軟體定義網路下建構與設計Hadoop叢集於Docker平台
8.	提升Hadoop MapReduce計算效能之研究-以抽取樣式歷史為例
9.	以自動複製提升重要雲端資料之可用性
10.	運用R語言與Hadoop分析開放資料-以天氣與農產品資料為例
11.	基於Hadoop MapReduce叢集設計平行化二元分類演算法
12.	運用Hadoop雲端運算技術分析宗教信仰對物質濫用之影響
13.	使用Hadoop搭配PHP連結管理HBase
14.	Hadoop 雲端運算平台效能模式之評估與改善
15.	基於Hadoop之非MapReduce的大資料R平行運算

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室