論文名稱(外文):Decentralized Load Balancing in Distributed File Systems: Algorithm Design, Implementation and Performance Evaluation
指導教授(外文):Hung-Chang Hsiao
外文關鍵詞:clouddistributed file systemload balance
MapReduce 是雲端計算的一種應用,其被用來處理、分析大量的資料,分散式檔案系統是影響其效能的關鍵部分。這種類型的檔案系統中,每個節點必需同時伴演計算與儲存的角色,一個檔案被切割成多份的區塊(chunk),並且被配置到不同的儲存節點(node)中,使得MapReduce 的程序可以在這些儲存節點中平行地去執行。然而在雲端的計算環境中,節點發生錯誤是正常的現象,而且節點可以任意的離開或加入於系統中,系統上的檔案也可以隨時的被建立、刪除以及複製,因此雲端系統存在著負載不平衡的問題,也就是檔案區塊沒有被平均得放在這個系統中有儲存節點中。在目前的分散式檔案系統中,將檔案區塊重新分配的工作通常都是高度依頼一個中央的節點,這種現象很明顯得不適用於現今的大規模且容易出錯的環境,因為這個中央的節點從大型系統中得到的大量工作很可能會使得它成為效能的瓶頸及發生單點失敗(single point of failure)的問題。所以在這篇論文中我們提出一個全分散式的負載平衡演算法來處理這種問題,並且我們將這個方法和現今的中央式方法和文獻中的其他分散式方法做比較,模擬實驗的結果顯示我們的方法和中央式的方法其效果相當,但不會有上述的中央式方法會有的缺點,並且優於另一個分散式的方法,我們比較的方式有討論負載不平衡因子、搬移檔案區塊的花費以及演算法本身額外的負擔。除了模擬,我們更進一步將提出的演算法實作在阿帕契(Apache Hadoop)分散式檔案系統中以探討其在真實叢集中的效能表現。
Distributed file systems are key building blocks for cloud computing applications based on the MapReduce programming paradigm. In such file systems, nodes simultaneously serve computing and storage functions; a file is partitioned into a number of chunks allocated in distinct nodes so that MapReduce tasks can be performed in parallel over the nodes. However, in a cloud computing environment, failure is the norm, and nodes maybe upgraded, replaced, and added in the system. Files can also be dynamically created, deleted, and appended. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the nodes. Emerging distributed file systems in production systems strongly depend on a central node for chunk reallocation. This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, and may thus become the performance bottleneck and the single point of failure. In this paper, a fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem. Our algorithm is compared against a centralized approach in a production system and a competing distributed solution presented in the literature. The simulation results indicate that our proposal is comparable with the existing centralized approach and considerably outperforms the prior distributed algorithm in terms of load imbalance factor, movement cost, and algorithmic overhead. The performance of our proposal implemented in the Hadoop distributed file system is further investigated in a cluster environment.
