跳到主要內容

臺灣博碩士論文加值系統

(34.204.172.188) 您好!臺灣時間:2023/09/27 16:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:游資婷
研究生(外文):YU TZU-TING
論文名稱:實現雲端運算Hadoop叢集儲存資料之差異分析
論文名稱(外文):Identifying the Data Discrepancy Existing in Hadoop Clusters
指導教授:葉佐任
口試委員:白英文黃文吉
口試日期:20160126
學位類別:碩士
校院名稱:輔仁大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:46
中文關鍵詞:HadoopHDFS雲端運算雲端備份
外文關鍵詞:HadoopHDFSCloud ComputingCloud backup
相關次數:
  • 被引用被引用:0
  • 點閱點閱:243
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
隨著網路的蓬勃發展,雲端運算在近幾年快速的熱門起來。在眾多的雲端平台軟體當中Hadoop 被廣泛使用。Hadoop 有著穩定且實用性高的特性,並且提供一個簡單且易使用的平台處理大量的檔案。

Hadoop 是一個分散式系統, 其預設的分散式檔案系統為HDFS(Hadoop Distributed File System)。HDFS 主要是由一台NameNode 和多台DataNode 所組成的叢集(Cluster)。NameNode 主要的功能是紀錄儲存檔案存放位置和檔案的相關資訊,而DataNode 為真正存放檔案的地方。NameNode 會將檔案切割成多個區塊(block),再將切割後的區塊存放在多台DataNode 上,並把每個區塊複製多份放在不同的DataNode 節點上。但這些備份都儲存在同一個叢集內,若所在的位置遇到火災或其他不可抗拒之因素導致資料損毀,會讓使用者損失重要的資料。為了確保資料不會因為這些問題遺失,我們會將資料備份到不同的叢集上。然而在備份資料的過程中,可能因為網路斷線或其他不可預期之異常原因傳送失敗,傳送者會無法得知資料在兩個叢集是否一致。

為了解決這個問題,我們產生一個HDFS Namespace Numbering Tree(HDFS NSNT) 建立出一個樹,再藉由NSNT 建立出HDFS Namespace Numbering File(HDFS NSNF),NSNF 會列出檔案的詳細資訊。最後我們會將兩個叢集分別建立出NSNF檔案,並比較兩個NSNF 內的欄位資訊,再將兩個NSNF 不同處列出,讓使用者可以快速找到兩個叢集整個檔案或部分檔案不同之處,增加可靠性。
In recent years, cloud computing is developing rapidly in the real of Internet.Among
many cloud computing platforms, Hadoop is widely used because of it's stability and performance. It can easiliy handle a large number of files in a very efficient way.

Hadoop is a distributed system, Hadoop Distributed File System(HDFS) is the default
file system used in Hadoop platform. HDFS consists of a NameNode and multiple DataNodes. NameNode records the file metadata, including file location, file owner, and other related information. DataNodes are the actual places storing all the files. Each file is depleted on several DataNodes in general. However, file contents can still not be retrieved of the NameNode is lost, or all DataNodes storing those files are destroyed at the same file. To fix this problem, we can backup important files on multiple Hadoop cluster. Nevertheless errors could occur during the process of file duplication.

We design and implement a scheme to identify the discrepancy between Hadoop cluster
so user can fixed dismatch between files duplicated on different Hadoop Clusters.
1 導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 研究目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Hadoop 環境架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Hadoop 分散式檔案架構. . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 NameNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 DataNode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 HDFS 資料流. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 HDFS 讀取檔案. . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 HDFS 寫入檔案. . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 DataNode 失敗與heartbeat . . . . . . . . . . . . . . . . . . . . 12
2.4 MapReduce 運作機制. . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 MapReduce 作業流程. . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 MapReduce 作業排程. . . . . . . . . . . . . . . . . . . . . . . 14
2.5 distcp 平行複製. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Hadoop FileSystem Shell ls 命令. . . . . . . . . . . . . . . . . . . . . . 15
2.7 HDFS 與其他檔案系統. . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 HDFS Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 研究方法與設計. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 設計概要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 建立HDFS NSNF 檔案. . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 HDFS NSNT 建立方法. . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 HDFS NSNF 檔案資訊. . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 實做HDFS NSNF 檔案. . . . . . . . . . . . . . . . . . . . . . 22
3.2.4 建立HDFS NSNF 檔案指令. . . . . . . . . . . . . . . . . . . 24
3.3 比較兩個叢集所建立出之HDFS NSNF . . . . . . . . . . . . . . . . . 26
3.3.1 比較兩個叢集所建立出的HDFS NSNF 指令. . . . . . . . . . 26
3.3.2 比較兩個叢集之HDFS NSNF . . . . . . . . . . . . . . . . . . 27
3.3.3 利用使用者已建立之HDFS NSNF 做比較. . . . . . . . . . . 28
4 實驗與結果分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 實驗設計. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 實驗環境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 實驗方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1 實驗一:以Eclipse 為起始點比較兩個叢集Hadoop 的HDFS
NSNF 檔案實驗結果. . . . . . . . . . . . . . . . . . . . . . . 32
4.3.2 實驗二:以Hadoop 起始點比較兩個叢集Hadoop 的HDFS
NSNF 檔案實驗結果. . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 實驗三:以Linux 起始點比較兩個叢集Hadoop 的HDFS
NSNF 檔案實驗結果. . . . . . . . . . . . . . . . . . . . . . . 35
4.4 實驗總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 結論與未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
[1] Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung, "The Google file system,"
ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.
[2] J. Dean and S. Ghemawat, ``Mapreduce: Simplified data processing on large clusters,''
In OSDI' 04:sixth Symposium on Operating System Design and Implemention,
2004.
[3] K.Shvachko, H.Kuang, S.Radia, and R.Chansler, "The hadoop distributed file system,"
the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies,
2010.
[4] Apache Hadoop. http://hadoop.apache.org.
[5] D. Borthakur, "Hdfs architecture guide," In http://hadoop.apache.org/common/docs/
current/hdfs design.pdf, 2008.
[6] Hadoop’s Fair Scheduler http://hadoop.apache.org/common/docs/r0.20.2/fair scheduler.
html
[7] Hadoop’s Capacity Scheduler: http://hadoop.apache.org/core/docs/current/capacity
scheduler.html.
[8] Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston
et al., “Building a high-level dataflow system on top of MapReduce: the Pig
experience,”Proceedings of the VLDB Endowment, 2.2, 1414-1425, 2009.
[9] Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Murthy, R et
al., Hive: a warehousing solution over a map-reduce framework. Proceedings of the
VLDB Endowment, 2.2, 1626-1629, 2009.
[10] Hunt, P., Konar, M., Junqueira, F. P., and Reed, B, "ZooKeeper: Wait-free Coordination
for Internet-scale Systems," USENIX Annual Technical Conference, Vol. 8.
2010.
[11] Hitz, D., J. Lau, and M. Malcolm, "File Systems Design for an NFS File Server
Application," of the Winter USENIX Technical Conference, 1994.
[12] Ross, Robert B., and Rajeev Thakur, "PVFS: A parallel file system for Linux clusters,"
Proceedings of the fourth annual Linux Showcase and Conference, 2000.
[13] Tantisiriroj, W., Patil, S., and Gibson, G, Data-intensive file systems for internet
services: A rose by any other name, Parallel Data Laboratory, Technical Rep. UCB/
EECS-2008-99, 2008.
[14] Lustre File System. http://www.lustre.org
[15] Hunt, P., Konar, M., Junqueira, F. P., and Reed, B, "ZooKeeper: Wait-free Coordination
for Internet-scale Systems," USENIX Annual Technical Conference, Vol. 8.
2010.
[16] Hitz, Dave, James Lau, and Michael A. Malcolm, "File System Design for an NFS
File Server Appliance," USENIX winter, Vol. 94. 1994.
[17] Mackey, Grant, Saba Sehrish, and Jun Wang, "Improving metadata management
for small files in HDFS," Cluster Computing and Workshops, 2009. CLUSTER'09.
IEEE International Conference on. IEEE, 2009.
[18] W. Malik, "A distributed namespace for a distributed file system," Master's thesis,
KTH, School of Information and Communication Technology , 2012.
[19] A. Ryan, "Under the hood: Hadoop distributed filesystem reliability with namenode
and avatarnode," In https://www.facebook.com/notes/facebook-engineering/underthe-
hood-hadoop-distributedfilesystem-reliability-with-namenode-and-avata/
10150888759153920, 2012.
[20] Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R. W., and Ofir,
J, "Deciding when to forget in the Elephant file system," ACM SIGOPS Operating
Systems Review, Vol. 33. No. 5. ACM, 1999.
[21] S. Srinivas, "An introduction to hdfs federation," http://hortonworks.com/blog/anintroduction-
to-hdfsfederation.
[22] Wang, F., Qiu, J., Yang, J., Dong, B., Li, X., and Li, Y, "Hadoop high availability
through metadata replication," Proceedings of the first international workshop on
Cloud data management, ACM, 2009.
[23] White, Tom. Hadoop: The definitive guide, " O'Reilly Media, Inc," 2012.
[24] Gantz, John, and David Reinsel, The digital universe in 2020: Big data, bigger digital
shadows, and biggest growth in the far east. IDC iView: IDC Analyze the future, 1-
16, 2007, 2012.
[25] DRBD, "Disk Replication Block Device," "http://www.drbd.org/," 2014.
[26] S. Agarwal, D. Borthakur, and I. Stoica, "Snapshots in hadoop distributed file system,"
In Technical report, EECS Department, University of California, Berkeley,
2010.
46
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊