跳到主要內容

臺灣博碩士論文加值系統

(44.222.82.133) 您好!臺灣時間:2024/09/15 21:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鍾學毅
研究生(外文):Hsueh-YiChung
論文名稱:分散式檔案系統中之負載平衡:設計、實現與效能評估
論文名稱(外文):Decentralized Load Balancing in Distributed File Systems: Algorithm Design, Implementation and Performance Evaluation
指導教授:蕭宏章
指導教授(外文):Hung-Chang Hsiao
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:57
中文關鍵詞:雲端負載平衡分散式檔案系統
外文關鍵詞:clouddistributed file systemload balance
相關次數:
  • 被引用被引用:0
  • 點閱點閱:257
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
MapReduce 是雲端計算的一種應用,其被用來處理、分析大量的資料,分散式檔案系統是影響其效能的關鍵部分。這種類型的檔案系統中,每個節點必需同時伴演計算與儲存的角色,一個檔案被切割成多份的區塊(chunk),並且被配置到不同的儲存節點(node)中,使得MapReduce 的程序可以在這些儲存節點中平行地去執行。然而在雲端的計算環境中,節點發生錯誤是正常的現象,而且節點可以任意的離開或加入於系統中,系統上的檔案也可以隨時的被建立、刪除以及複製,因此雲端系統存在著負載不平衡的問題,也就是檔案區塊沒有被平均得放在這個系統中有儲存節點中。在目前的分散式檔案系統中,將檔案區塊重新分配的工作通常都是高度依頼一個中央的節點,這種現象很明顯得不適用於現今的大規模且容易出錯的環境,因為這個中央的節點從大型系統中得到的大量工作很可能會使得它成為效能的瓶頸及發生單點失敗(single point of failure)的問題。所以在這篇論文中我們提出一個全分散式的負載平衡演算法來處理這種問題,並且我們將這個方法和現今的中央式方法和文獻中的其他分散式方法做比較,模擬實驗的結果顯示我們的方法和中央式的方法其效果相當,但不會有上述的中央式方法會有的缺點,並且優於另一個分散式的方法,我們比較的方式有討論負載不平衡因子、搬移檔案區塊的花費以及演算法本身額外的負擔。除了模擬,我們更進一步將提出的演算法實作在阿帕契(Apache Hadoop)分散式檔案系統中以探討其在真實叢集中的效能表現。
Distributed file systems are key building blocks for cloud computing applications based on the MapReduce programming paradigm. In such file systems, nodes simultaneously serve computing and storage functions; a file is partitioned into a number of chunks allocated in distinct nodes so that MapReduce tasks can be performed in parallel over the nodes. However, in a cloud computing environment, failure is the norm, and nodes maybe upgraded, replaced, and added in the system. Files can also be dynamically created, deleted, and appended. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the nodes. Emerging distributed file systems in production systems strongly depend on a central node for chunk reallocation. This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, and may thus become the performance bottleneck and the single point of failure. In this paper, a fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem. Our algorithm is compared against a centralized approach in a production system and a competing distributed solution presented in the literature. The simulation results indicate that our proposal is comparable with the existing centralized approach and considerably outperforms the prior distributed algorithm in terms of load imbalance factor, movement cost, and algorithmic overhead. The performance of our proposal implemented in the Hadoop distributed file system is further investigated in a cluster environment.
[1] H. Abu-Libdeh, P. Costa, A. Rowstron, G. O'Shea, and A. Donnelly. Symbiotic
Routing in Future Data Centers. In Proc. ACM SIGCOMM'10, pages 51{62, Aug.
2010.
[2] Apache Hadoop. http://hadoop.apache.org/.
[3] A. Bharambe, M. Agrawal, and S. Seshan. Mercury: Supporting Scalable Multi-
Attribute Range Queries. In Proc. ACM SIGCOMM'04, pages 353{366, Aug.
2004.
[4] J. W. Byers, J. Considine, and M. Mitzenmacher. Simple Load Balancing for
Distributed Hash Tables. In Proc. 1st Int'l Workshop Peer-to-Peer Systems
(IPTPS'03), pages 80{87, Feb. 2003.
[5] G. Copeland, W. Alexander, E. Boughter, and T. Keller. Data Placement in
Bubba. In Proc. ACM SIGMOD'88, pages 99{108, June 1988.
[6] F. Dabek, M. F. Kaashoek, David Karger, R. Morris, and I. Stoica. Wide-Area
Cooperative Storage with CFS. In Proc. 18th ACM Symp. Operating Systems
Principles (SOSP'01), pages 202{215, Oct. 2001.
[7] J. Dean and S. Ghemawat. MapReduce: Simpli ed Data Processing on Large Clus-
ters. In Proc. 6th Symp. Operating System Design and Implementation (OSDI'04),
pages 137{150, Dec. 2004.
[8] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin,
S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's HighlyAvailable Key-value Store. In Proc. 21st ACM Symp. Operating Systems Principles
(SOSP'07), pages 205{220, Oct. 2007.
[9] D. Eastlake and P. Jones. US Secure Hash Algorithm 1 (SHA1). RFC 3174, Sept.
2001.
[10] P. Ganesan, M. Bawa, and H. Garcia-Molina. Online Balancing of Range-
Partitioned Data with Applications to Peer-to-Peer Systems. In Proc. 13th Int'l
Conf. Very Large Data Bases (VLDB'04), pages 444{455, Sept. 2004.
[11] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the
Theory of NP-Completeness. W.H. Freeman and Co., 1979.
[12] S. Ghemawat, H. Gobio , and S.-T. Leung. The Google File System. In Proc. 19th
ACM Symp. Operating Systems Principles (SOSP'03), pages 29{43, Oct. 2003.
[13] B. Godfrey and I. Stoica. Heterogeneity and Load Balance in Distributed Hash
Tables. In Proc. IEEE INFOCOM'05, pages 596{606, Mar. 2005.
[14] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu.
BCube: A High Performance, Server-Centric Network Architecture for Modular
Data Centers. In Proc. ACM SIGCOMM'09, pages 63{74, Aug. 2009.
[15] Hadoop Distributed File System. http://hadoop.apache.org/hdfs/.
[16] Hadoop Distributed File System. Rebalancing Blocks. http://developer.yahoo.
com/hadoop/tutorial/module2.html#rebalancing.
[17] W. K. Hastings. Monte Carlo Sampling Methods Using Markov Chains and Their
Applications. Biometrika, 57(1):97{109, Apr. 1970.
[18] HDFS Federation. http://hadoop.apache.org/common/docs/r0.23.0/
hadoop-yarn/hadoop-yarn-site/Federation.html.
[19] H.-C. Hsiao and C.-W. Chang. A Symmetric Load Balancing Algorithm with
Performance Guarantees for Distributed Hash Tables. IEEE Trans. Computers,
2012. http://doi.ieeecomputersociety.org/10.1109/TC.2012.13.
[20] H.-C. Hsiao, H. Liao, S.-S. Chen, and K.-C. Huang. Load Balance with Imperfect
Information in Structured Peer-to-Peer Systems. IEEE Trans. Parallel Distrib.
Syst., 22(4):634{649, Apr. 2011.
[21] H.-C. Hsiao, Y.-C. Lin, and H. Liao. Building Small-World Peer-to-Peer Networks
Based on Hierarchical Structures. IEEE Trans. Parallel Distrib. Syst., 20(7):1023{
1037, July 2009.
[22] S. Iyer, A. Rowstron, and P. Druschel. Squirrel: A Decentralized Peer-to-Peer
Web Cache. In Proc. ACM PODC'02, pages 213{222, July 2002.
[23] M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-Based Aggregation in Large
Dynamic Networks. ACM Trans. Comput. Syst., 23(3):219{252, Aug. 2005.
[24] M. Jelasity, S. Voulgaris, R. Guerraoui, A.-M. Kermarrec, and M. V. Steen.
Gossip-Based Peer Sampling. ACM Trans. Comput. Syst., 25(3), Aug. 2007.
[25] D. Karger and M. Ruhl. Simple E cient Load Balancing Algorithms for Peer-to-
Peer Systems. In Proc. 16th ACM Symp. Parallel Algorithms and Architectures
(SPAA'04), pages 36{43, June 2004.
[26] J. Kubiatowicz, D. Bindel, Y. Chen, S. E. Czerwinski, P. R. Eaton, D. Geels,
R. Gummadi, S. C. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Y.
Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage. In Proc.
9th Int'l Conf. Architectural Support for Programming Languages and Operating
Systems (ASPLOS'00), pages 190{201, Nov. 2000.
[27] W.B. Ligon and R. B. Ross. Beowulf Cluster Computing with Linux, chapter
PVFS: Parallel Virtual File System, pages 391{430. MIT Press, Nov. 2001.
[28] Lustre. http://www.lustre.org/.
[29] G. S. Manku. Balanced Binary Trees for ID Management and Load Balance
in Distributed Hash Tables. In Proc. 23rd ACM Symp. Principles Distributed
Computing (PODC'04), pages 197{205, July 2004.
[30] K. McKusick and S. Quinlan. GFS: Evolution on Fast-Forward. Commun. ACM,
53(3):42{49, Jan. 2010.
[31] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Al-
gorithms and Probabilistic Analysis, chapter Coupling of Markov Chains, pages
271{294. Cambridge University, 2005.
[32] L. M. Ni and K. Hwang. Optimal Load Balancing in a Multiple Processor System
with Many Job Classes. IEEE Trans. Software Eng., 11(5):491{496, May 1985.
[33] L. M. Ni, C.-W. Xu, and T. B. Gendreau. A Distributed Drafting Algorithm for
Load Balancing. IEEE Trans. Software Eng., 11(10):1153{1161, Oct. 1985.
[34] M. Raab and A. Steger. Balls into Bins|A Simple and Tight Analysis. LNCS
1518, pages 159{170, Oct. 1998.
[35] I. Raicu, I. T. Foster, and P. Beckman. Making a Case for Distributed File systems
at Exascale. In Proc. 3rd Int'l Workshop Large-Scale System and Application
Performance (LSAP'11), pages 11{18, June 2011.
[36] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica. Load Balancing
in Structured P2P Systems. In Proc. 2nd Int'l Workshop Peer-to-Peer Systems
(IPTPS'02), pages 68{79, Feb. 2003.
[37] A. Rowstron and P. Druschel. Pastry: Scalable, Distributed Object Location and
Routing for Large-Scale Peer-to-Peer Systems. LNCS 2218, pages 161{172, Nov.
2001.
[38] A. Rowstron and P. Druschel. Storage Management and Caching in PAST, a
Large-Scale, Persistent Peer-to-Peer Storage Utility. In Proc. 18th ACM Symp.
Operating Systems Principles (SOSP'01), pages 188{201, Oct. 2001.
[39] H. Sagan. Space-Filling Curves. Springer, 1st edition, 1994.
[40] P. Scheuermann, G. Weikum, and P. Zabback. Data Partitioning and Load Bal-
ancing in Parallel Disk Systems. In Proc. 7th Int'l Conf. Very Large Data Bases
(VLDB'98), pages 48{66, Feb. 1998.
[41] F. B. Schmuck and R. L. Haskin. GPFS: A Shared-Disk File System for Large
Computing Clusters. In Proc. USENIX Conf. File and Storage Technologies
(FAST'02), pages 231{244, Jan. 2002.
[42] H. Shen and C.-Z. Xu. Locality-Aware and Churn-Resilient Load Balancing
Algorithms in Structured P2P Networks. IEEE Trans. Parallel Distrib. Syst.,
18(6):849{862, June 2007.
[43] H. Shen, C.-Z. Xu, and G. Chen. Cycloid: A Scalable Constant-Degree Lookup-
E cient P2P Overlay Network. Performance Evaluation, 63(3):195{216, Mar.
2006.
[44] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek,
and H. Balakrishnan. Chord: a Scalable Peer-to-Peer Lookup Protocol for Internet
Applications. IEEE/ACM Trans. Netw., 11(1):17{21, Feb. 2003.
[45] J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, M. F. Kaashoek, and R. Morris.
Flexible, Wide-Area Storage for Distributed Systems with WheelFS. In Proc. 6th
USENIX Symp. Networked Systems Design and Implementation (NSDI'09), pages
43{58, Apr. 2009.
[46] S. Surana, B. Godfrey, K. Lakshminarayanan, R. Karp, and I. Stoica. Load Balanc-
ing in Dynamic Structured P2P Systems. Performance Evaluation, 63(6):217{240,
Mar. 2006.
[47] H. Tang, A. Gulbeden, J. Zhou, W. Strathearn, T. Yang, and L. Chu. A Self-
Organizing Storage Cluster for Parallel Data-Intensive Applications. In Proc. Int'l
Conf. High Performance Networking and Computing (SC'04), Nov. 2004.
[48] Ubuntu. http://www.ubuntu.com/.
[49] VMware. http://www.vmware.com/.
[50] Q. H. Vu, B. C. Ooi, M. Rinard, and K.-L. Tan. Histogram-Based Global Load
Balancing in Structured Peer-to-Peer Systems. IEEE Trans. Knowl. Data Eng.,
21(4):595{608, Apr. 2009.
[51] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A
Scalable, High-Performance Distributed File System. In Proc. 7th Symp. Operating
System Design and Implementation (OSDI'06), pages 307{320, Nov. 2006.
[52] B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka,
and B. Zhou. Scalable Performance of the Panasas Parallel File System. In Proc.
6th USENIX Conf. File and Storage Technologies (FAST'08), pages 17{33, Feb.
2008.
[53] Xen. http://www.xen.org/.
[54] Y. Zhu and Y. Hu. E cient, Proximity-Aware Load Balancing for DHT-Based
P2P Systems. IEEE Trans. Parallel Distrib. Syst., 16(4):349{361, Apr. 2005.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊