臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.106) 您好！臺灣時間：2026/04/03 20:41

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

林映辰

研究生(外文):

Ying-Chen Lin

論文名稱:

HDFS分散式檔案系統之負載均衡演算法設計

論文名稱(外文):

A Load-Balancing Algorithm for Hadoop Distributed File System

指導教授:

林其誼

口試委員:

林振緯、蔡智強、林其誼

口試日期:

2015-06-29

學位類別:

碩士

校院名稱:

淡江大學

系所名稱:

資訊工程學系碩士班

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2015

畢業學年度:

103

語文別:

中文

論文頁數:

中文關鍵詞:

Hadoop、分散式檔案系統、負載均衡、雲端運算

外文關鍵詞:

Hadoop、Distributed File System、Load Balancing、Cloud Computing

相關次數:

被引用:0
點閱:367
評分:
下載:0
書目收藏:1

隨著網路使用的普及以及儲存設備成本的降低，許多企業開始提供雲端服務。像是由Apache軟體基金會發表的Hadoop Distributed File System (HDFS)、Hadoop MapReduce和HBase，而其中HDFS分散式檔案系統更是廣受歡迎。HDFS為Master-Slave的架構，由單一個NameNode及多個存放著DataNode的Rack組成。其中NameNode為Master負責管理DataNode及使用者，而DataNode則為Slave負責儲存檔案。
HDFS以分散式的原則儲存檔案，每個檔案皆會被切割成固定大小的Block，並且每個Block預設儲存3份Replica於不同的DataNode上。然而HDFS儲存Block的方式Block Placement並未考慮DataNode的儲存使用率，可能造成儲存負載不均衡，因此需要Balancer解決。Balancer為NameNode的一項工具，由管理員下指令執行，它會反覆的將Block從儲存使用率較高的DataNode搬移到儲存使用率較低的DataNode，直到所有DataNode的儲存使用率都介於平均值加減門檻值之間。但Balancer執行時，需大量搬移Block，會造成網路資源的消耗。並且於過往的研究指出，NameNode為系統中的效能瓶頸，由NameNode負責執行Balancer會更加降低HDFS的效能。
因此，本研究將針對所有會影響儲存使用率的因素，設計一個新的儲存負載均衡演算法。並於演算法中新增一個節點BalanceNode，負責高儲存使用率及低儲存使用率DataNode的配對工作，讓低儲存使用率的DataNode能夠幫高儲存使用率的DataNode分擔儲存負載。

With the advancement of Internet and increasing data demands, many enterprises are offering cloud services to their customers. Among various cloud computing platforms, the Apache Hadoop project has been widely adopted by many large organizations and enterprises. In the Hadoop ecosystem, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and HBase are open source equivalents of the Google proposed Google File System (GFS), MapReduce framework, and BigTable, respectively. To meet the requirement of horizontal scaling of storage in the big data era, HDFS has received significant attention among researchers. HDFS clusters are in a master-slave architecture: there is a single NameNode and a number of DataNodes in each cluster. NameNode is the master responsible for managing the DataNodes and the client accesses. DataNodes are slaves, and are responsible for storing data.
As the name suggests, HDFS stores files distributedly. Files are divided into fixed-sized blocks, and in default configuration each block has three replicas stored in three different DataNodes to ensure the fault tolerance capability of HDFS. However, Hadoop''s default strategy of allocating new blocks does not take into account of DataNodes’ utilization, which can lead to load imbalance in HDFS. To cope with the problem, NameNode has a built-in tool called Balancer, which can be executed by the system administrator. Balancer iteratively moves blocks from DataNodes in high utilization to those in low utilization to keep each DataNode’s disk utilization within a configurable range centered at the average utilization. The primary cost of using Balancer to achieve load balance is the bandwidth consumed during the movement of blocks. Besides, the previous research shows that the NameNode is the performance bottleneck of HDFS. That is, frequent execution of Balancer by the NameNode may degrade the performance of HDFS.
Therefore, in this research we would like to design a new load-balancing algorithm by considering all the situations that may influence the load-balancing state. In the proposed algorithm a new role named BalanceNode is introduced to help in matching heavy-loaded and light-loaded nodes, so those light-loaded nodes can share part of the load from heavy-loaded ones. The simulation results show that our algorithm not only can achieve good load-balancing state in the HDFS, but also with minimized movement cost.

目錄
第一章緒論 1
1.1 研究背景 1
1.2 研究範圍與重要性 4
1.3 研究動機與目的 5
1.3.1 HDFS Replica Mangement介紹 5
1.3.2 HDFS Block Placement介紹 6
1.3.3 HDFS Balancer介紹 7
1.4 論文架構 7
第二章相關研究 9
2.1 基於Chord DHT網路架構的負載均衡演算法 9
2.1.1 Chord DHT網路架構 9
2.1.2 Peer-to-Peer系統負載均衡演算法 10
2.1.3 Cloud分散式檔案系統負載均衡演算法 11
2.2 Block Placement之改良演算法 13
2.3 雲端運算環境之負載均衡演算法 16
2.4 文獻整理 18
第三章演算法設計 20
3.1 新增Block之處理機制 20
3.2 刪除Block之處理機制 22
3.2.1 BalanceNode介紹 22
3.2.2 DataNode刪除Block比例算法 23
3.2.3 門檻值l、h與δ之設定 25
3.2.4 Light-loaded與Heavy-loaded DataNode傳送Block方法 26
3.3 新增DataNode之處理機制 28
3.4 刪除DataNode之處理機制 30
3.5 方法總結 31
第四章實驗模擬與分析 39
4.1 實驗一 40
4.2 實驗二 48
4.3 實驗三 53
4.4 實驗四 59
第五章結論與未來展望 65
參考文獻 66
附錄 – 英文論文 68

圖目錄
圖1-1 Digital Universe資料量[7] 2
圖1-2 HDFS架構示意圖 3
圖2-1 Chord DHT網路架構示意圖[19] 10
圖2-2 Inspiration from the Honeybee演算法示意圖[16] 17
圖3-1 Light-loaded與Heavy-loaded DataNode傳送Block流程圖 28
圖4-1 實驗一及實驗二使用者行為示意圖 41
圖4-2 實驗一之DataNode儲存使用率標準差 43
圖4-3 實驗一之搬移Block數量 44
圖4-4 實驗一之花費時間 45
圖4-5 實驗一之最後新增Block分布狀況 46
圖4-6 實驗二之DataNode儲存使用率標準差 49
圖4-7 實驗二之搬移Block數量 50
圖4-8 實驗二之最後新增Block分布狀況 52
圖4-9 實驗三之DataNode儲存使用率標準差 55
圖4-10 實驗三之搬移Block數量 56
圖4-11 實驗三之花費時間 57
圖4-12 實驗三之最後新增Block分布狀況 58
圖4-13 實驗四之DataNode儲存使用率標準差 60
圖4-14 實驗四之搬移Block數量 61
圖4-15 實驗四之最後新增Block分布狀況 63

表目錄
表2-1 文獻整理 19
表3-1 新增Block之負責工作整理 32
表3-2 刪除Block之負責工作整理 32
表3-3 新增DataNode之負責工作整理 33
表3-4 刪除DataNode之負責工作整理 33
表3-5 Light-loaded與Heavy-loaded DataNode配對之負責工作整理 34
表3-6 DataNode之pseudo code 35
表3-7 NameNode之pseudo code 37
表3-8 BalanceNode之pseudo code 38
表4-1 實驗一及實驗二之DataNode容量配置 41
表4-2 實驗一之DataNode儲存使用率標準差 42
表4-3 實驗一之搬移Block數量 44
表4-4 實驗一之花費時間 45
表4-5 實驗一之最後新增Block分布狀況 46
表4-6 實驗二之DataNode儲存使用率標準差 48
表4-7 實驗二之搬移Block數量 50
表4-8 實驗二之最後新增Block分布狀況 51
表4-9 實驗三及實驗四之DataNode容量配置 54
表4-10 實驗三之DataNode儲存使用率標準差 54
表4-11 實驗三之搬移Block數量 55
表4-12 實驗三之花費時間 56
表4-13 實驗三之最後新增Block分布狀況 57
表4-14 實驗四之DataNode儲存使用率標準差 59
表4-15 實驗四之搬移Block數量 61
表4-16 實驗四之最後新增Block分布狀況 62

[1] Apache Hadoop. https://hadoop.apache.org/, last accessed June, 2015.
[2] Apache HBase. http://hbase.apache.org/, last accessed June, 2015.
[3] MapReduce. http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html, last accessed June, 2015.
[4] Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., et al. (2008). Bigtable: A distributed storage system for structured data. ACM Trans.Comput.Syst., 26(2), 4:1-4:26.
[5] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Commun.ACM, 51(1), 107-113.
[6] Fan, K., Zhang, D., Li, H., & Yang, Y. (2013). An adaptive feedback load balancing algorithm in HDFS. Proceedings of the 2013 5th International Conference on Intelligent Networking and Collaborative Systems, pp. 23-29.
[7] Gantz, J., & Reinsel, D. (2012). The digital universe in 2020. Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. IDC iView, https://www.emc-technology.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf, last accessed June, 2015.
[8] Ghemawat, S., Gobioff, H., & Leung, S. (2003). The google file system. Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, Bolton Landing, NY, USA. pp. 29-43.
[9] Hsiao, H., Chung, H., Shen, H., & Chao, Y. (2013). Load rebalancing for distributed file systems in clouds. IEEE Transactions on Parallel and Distributed Systems, 24(5), 951-962.
[10] Jelasity, M., Montresor, A., & Babaoglu, O. (2005). Gossip-based aggregation in large dynamic networks. ACM Trans.Comput.Syst., 23(3), 219-252.
[11] Jelasity, M., Voulgaris, S., Guerraoui, R., Kermarrec, A., & van Steen, M. (2007). Gossip-based peer sampling. ACM Trans.Comput.Syst., 25(3)
[12] Jie Ling, & Xiangyang Jiang. (2013). Distributed storage method based on information dispersal algorithm. 2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA), pp. 624-626.
[13] Karger, D. R., & Ruhl, M. (2004). Simple efficient load balancing algorithms for peer-to-peer systems. Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, Barcelona, Spain. pp. 36-43.
[14] Mao, Y., & Ling, J. (2012). Research on load balance strategy based on grey prediction theory in cloud storage. Proceedings of the 2nd International Conference on Electronic and Mechanical Engineering and Information Technology (2012), pp. 199-203.
[15] McKusick, K., & Quinlan, S. (2010). GFS: evolution on fast-forward. Communications of the ACM, 53(3), 42-49.
[16] Randles, M., Lamb, D., & Taleb-Bendiab, A. (2010). A comparative study into distributed load balancing algorithms for cloud computing. Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops, pp. 551-556.
[17] Rozier, E. W. D., Zhou, P., & Divine, D. (2013). Building intelligence for software defined data centers: Modeling usage patterns. Proceedings of the 6th International Systems and Storage Conference, Haifa, Israel. pp. 20:1-20:10.
[18] Shvachko, K., Hairong Kuang, Radia, S., & Chansler, R. (2010). The hadoop distributed file system. 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10.
[19] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D. R., Kaashoek, M. F., Dabek, F., et al. (2003). Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, 11(1), 17-32.
[20] Ye, X., Huang, M., Zhu, D., & Xu, P. (2012). A novel blocks placement strategy for hadoop. ACIS International Conference on Computer and Information Science, 0, 3-7

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於Hadoop平台的雲端基因架構
2.	以Hadoop為平台-結合異質資料庫與Hive之加速查詢應用
3.	基於Android與雲端平台嵌入式心電圖監控系統的研發
4.	Hadoop雲端平台在工程應用之探討研究
5.	植基於Hadoop雲端運算架構之平行基因演算法與粒子群演算法的應用
6.	Cloud BIM: 應用雲端運算與WebGL技術之網路式BIM系統
7.	雲端平台大數據資料庫研究-以報關訊息資料為例
8.	基於Hadoop雲端運算架構之平行基因演算研究－以學習障礙輔助診斷系統為例
9.	基於Hadoop雲端運算平台的平行粒子演算之研究-以學習障礙輔助診斷系統為例
10.	於雲端計算環境上高可用性儲存系統之實作
11.	Hadoop雲端運算效能評估與行動管理系統
12.	NCTU CStack：OpenStack 與 Ceph 的整合與應用
13.	雲端運算服務品質保證之負載平衡機制
14.	在階層式雲端運算網路架構下以負載平衡為導向之排程機制
15.	一個雲端計算平台上針對互動式工作流程應用的最小負載分配法之動態資源供應架構

無相關期刊

1.	實現Hadoop叢集HDFS檔案之歷史版本保存
2.	實現雲端運算 Hadoop HDFS 磁碟及記憶體之即時分級服務
3.	以軟體定義網路改善Hadoop叢集之運作效能
4.	利害關係人、綠色組織文化、企業綠色管理與組織績效之關聯研究
5.	後勤補給作業程序與補給績效之關聯性研究
6.	跨叢集之聯邦Hadoop分散式檔案系統
7.	HDFS分散式檔案系統容錯管理架構
8.	新北市國民中學教師對推動國際教育認知與看法之研究
9.	政府政策與企業社會責任
10.	Hadoop 雲端運算平台效能模式之評估與改善
11.	政府經常性支出與經濟成長-跨國實證研究
12.	消費外部性與我國區域間消費行為之探討─空間計量模型之應用
13.	從產品特性觀點去探討Online、Offline與O2O商務模式之取捨
14.	以科技接受模式探討銀行顧客使用網路銀行之意願
15.	媒介改變內容：論媒介如何形塑但丁《神曲》的故事

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室