跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.170) 您好!臺灣時間:2024/12/08 13:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊孟翰
研究生(外文):YANG, MENG-HAN
論文名稱:以樹莓派為基礎的Hadoop Cluster之執行時間改進方法
論文名稱(外文):Methods of Execution Time Improvement with Raspberry Pi Based Hadoop Cluster
指導教授:林嬿文
指導教授(外文):LIN, YEN-WEN
口試委員:洪國寶王丕中顧維祺林嬿雯
口試委員(外文):HORNG, GWO-BOAWANG, PI-CHUNGKU, WEI-CHILIN, YEN-WEN
口試日期:2017-07-12
學位類別:碩士
校院名稱:國立臺中教育大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:60
中文關鍵詞:物聯網大數據樹莓派叢集
外文關鍵詞:IoTBig DataRaspyberry PiClusterHadoop
相關次數:
  • 被引用被引用:0
  • 點閱點閱:1050
  • 評分評分:
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:2
物聯網(Internet of Things)的興起,成為未來網路通訊科技重要的發展趨勢。因此物聯網相關的研究與應用也越來越受到重視。由於物聯網裝置的數量龐大,會以相當快的速度產生相當大量的資料,也就是大數據(Big data)。因此,如何快速有效率的去分析、處理並應用這些大數據是相當重要的挑戰。這些大量資料通常需要及時的處理(Real-time processing),然而,如果透過傳統的雲端運算(Cloud Computing),將資料傳送給遙遠的雲端處理,再將處理好的資料回傳,會耗費較多的時間。此外,使用者要連上遠端的大型資料中心才能存取服務,如果所有人都共用一個資料中心,或產生資料的節點太多,會因為佔用大量網路頻寬,造成資料中心的負擔過重。因此出現了功能相似,但運算能力較弱的霧運算(Fog computing)。資料處理則是更接近用戶終端裝置,這麼做的原因是可以化解可能出現的網路塞車現象,有效減輕網路流量,資料中心的運算負荷也跟著減輕。在本論文中,以九台低功耗的微型控制器,樹莓派三代B版(Raspberry Pi 3 Model B),架設了一個小型運算叢集(Cluster)。此叢集使用了阿帕契的分散式檔案系統(Apache HDFS: Apache Hadoop Distributed File System),並改善樹莓派叢集執行MapReduce的執行時間。此外,本文透過四種方法以及三種Master-Slave Model,分別對照五種檔案大小來做對照觀察比較。由於在預設環境下執行MapRedce的第一種方法無法有較短的執行時間,因此進一步提出另外三種方法,第二種方法為修改磁碟使用率參數,第三種方法為在每個節點上設定了暫存remote resources的位置,第四種方法為第二、第三種方法的組合,可以減少最多MapReduce執行時間。透過以上方法,可以達到改善樹莓派叢集執行MapReduce時間過長的問題。
In recent years, the rapid development of computer and network technology leads to the emergence of Internet of Things (IoT). The development of IoT will become the important trend in the future. The research and applications related to IoT attract extensive attention. Due to the large number of Internet of Things devices, large volume of data are generated at a high speed. To process these data efficiently is an important challenge. These data usually requires real-time processing. However, if use traditional cloud computing, the data are sent to the remote cloud for processing and wait the processed data back. It will spend more time. Otherwise, the users have to connect the remote centralized data center to access the service. When many users connect the data center at the same time, it needs network bandwith, and leads the overhead to the data center. Therefore, to remedy this problem, the concept of fog computing has been proposed. With fog compting, data processing is closer to the user’s terminal device, so it can release network traffic congestion. That effectively reduces network traffic, and computing load of data center also be improved. In this thesis, nine low-power micro-controller are set up as a small cluster. Also, we use the Apache Hadoop Distributed File System (Apache HDFS) for our experiments, we test MapReduce execution time of four methods and three Master-Slave Model in five file sizes situation. Since the MapReduce execution time of first method is too long, the other three methods are proposed. The second method modifies the disk utilization parameter, and reduces MapReduce execution time. The third method sets the temporary storage for remote resources. The fourth method combines second method and third method. Through these proposed methods, the MapReduce Execution time can be improved.
目錄
摘要 i
Abstract ii
目錄 iii
圖目錄 vi
表目錄 ix
第一章 序論 1
1.1 研究背景與動機 1
1.2 研究目標 3
1.3 論文架構 4
第二章 背景與相關研究 5
2.1 物聯網 5
2.2 大數據 5
2.3 Hadoop 6
2.3.1 HDFS (Hadoop Distributed File System) 7
2.3.2 MapReduce 9
2.4 霧運算 (Fog Computing) 10
2.5 樹莓派叢集 (Raspberry Pi Cluster) 11
第三章 研究方法 12
3.1 系統概觀 13
3.2 實驗環境 16
3.2.1 樹莓派三代B版(Raspberry Pi 3 Model B) 16
3.2.2 網路設備 17
3.2.2.1 路由器 17
3.2.2.2 交換器 17
3.2.3供電設備 17
3.2.4 軟體安裝環境 18
3.2.5 Raspberry Pi Cluster Component 19
3.3. Hadoop檔案系統與計算框架 19
3.3.1 Hadoop相關專業術語 19
3.3.2 HDFS (Hadoop Distributed File System) 24
3.3.3 MapReduce 28
3.3.4 資料流程 (Data flow) 33
3.4 第一種方法: Original Hadoop (Unhealthy) 36
3.5 第二種方法: Not-Unhealthy 37
3.6 第三種方法: Unhealthy + Cache 39
3.7 第四種方法: Not-Unhealthy + Cache 42
第四章 實驗與效能分析 43
4.1 實驗目的 43
4.1.1 四種方法 43
4.1.2三種Master-Slave Model 44
4.1.3五種執行檔案大小 44
4.2 實驗測試工具 45
4.3 實驗結果與討論 45
4.3.1三種Master-Slave Model 45
4.3.2四種方法 49
4.3.3實驗結論 54
第五章 結論…………………………………………………………………………. ...56
參考文獻 57
圖目錄
Figure 3.1:系統架構 14
Figure 3.2:實作實驗平台 15
Figure 3.3:設定路由器所分配的IP 15
Figure 3.4:安裝的Java版本為1.8.0_91 18
Figure 3.5:安裝的Hadoop版本為2.7.1 18
Figure 3.6:安裝的protocol buffer版本為2.5.0 18
Figure 3.7:傳統處理模式 20
Figure 3.8:平行處理模式 21
Figure 3.9:本系統樹莓派叢集平行處理模式 22
Figure 3.10:啟動Namenode與Datanode 26
Figure 3.11:Namenode裡面的命名空間圖像和編輯日誌 27
Figure 3.12:Secondary namenode裡面的命名空間圖像和編輯日誌 27
Figure 3.13:命名空間圖像和編輯日誌的備份以及合併過程 28
Figure 3.14:MapReduce執行流程 29
Figure 3.15:Wordcount的流程圖 30
Figure 3.16:建立wordcount範例檔,命名為sample.txt 31
Figure 3.17:編輯範例檔內容(cmd文字介面編輯方式) 31
Figure 3.18:編輯範例檔內容(圖形介面編輯方式) 31
Figure 3.19:將sample.txt傳至hdfs檔案系統並確認上傳成功 32
Figure 3.20:將sample.txt傳至hdfs檔案系統並確認上傳成功(WebUI介面查看)32
Figure 3.21:執行wordcount程式,用MapReduce計算sample.txt的次數 32
Figure 3.22:找出輸出檔案,並印出不同word出現次數 33
Figure 3.23:HDFS架構以及流程圖 35
Figure 3.24:32個核心的Hadoop的資料流程 36
Figure 3.25:啟動Hadoop分散式系統以及YARN 37
Figure 3.26:Web UI介面查看NameNode 37
Figure 3.27:Web UI介面查看ResourceManager 38
Figure 3.28:在Hadoop設定檔yarn-site.xml做參數設定 38
Figure 3.29:在core-site.xml設定hadoop.tmp.dir為hdfs/tmp 41
Figure 3.30:在每個datanode上設置hdfs/tmp路徑後可以處理cache資料 41
Figure 4.1:1對2寫入執行時間 46
Figure 4.2:1對2讀取執行時間 46
Figure 4.3:1對4寫入執行時間 47
Figure 4.4:1對4讀取執行時間 48
Figure 4.5:1對8寫入執行時間 48
Figure 4.6:1對8讀取執行時間 49
Figure 4.7:Unhealthy寫入執行時間 50
Figure 4.8:Unhealthy讀取執行時間 50
Figure 4.9:Not-unhealthy寫入執行時間 51
Figure 4.10:Not-unhealthy讀取執行時間 51
Figure 4.11:Unhealthy+Cache寫入執行時間 52
Figure 4.12:Unhealthy+Cache讀取執行時間 53
Figure 4.13:Not-Unhealthy+Cache讀取執行時間 53
Figure 4.14:Not-Unhealthy+Cache讀取執行時間 54


表目錄
Table 2.1:Google的GFS與Hadoop的HDFS比較 7
Table 3.1:Raspberry Pi Cluster Component 19
Table 3.2:Namenode與Datanode負責資料的管理與分配 26
參考文獻
[1]Gartner Identifies the Top 10 Strategic Technology Trends for 2016, Available At: https://www.t-systems.com/news-media/gartner-top-10-strategic-technology-trends-for-2016-at-a-glance/1406184_1/blobBinary/gartner_top10.pdf
[2]C. Perera, C. H. Liu, and S. Jayawardena, “The Emerging Internet of Things Marketplace From an Industrial Perspective: A Survey” IEEE Transaction on Emerging Topics in Computing, vol. 3, no. 4, pp. 585-598, Jan. 2015
[3]B. Torğul, L. Şağbanşua, and F. Balo, “Internet of Things: A Survey” International Journal of Applied Mathematics, Electronics and Computers, pp. 104-110, Sep. 2016.
[4]A. Luigi, A. Iera, and G. Morabito. “The internet of things: A survey.” Computer networks vol. 54, no. 15, pp. 2787-2805, Oct 2010.
[5]F. Bonomi, R. Milito, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012.
[6]S. Choy, B. Wong, et al. “A hybrid edge-cloud architecture for reducing on-demand gaming latency.” Multimedia Systems, vol. 20, no. 5, pp. 503-519, 2014.
[7]S. Strover. "Rural internet connectivity." Telecommunications policy, vol. 25, no. 5, pp. 331-347, Jun 2001.
[8]W. Hajji, and F. P. Tso. “Understanding the Performance of Low Power Raspberry Pi Cloud for Big Data.” Electronics, vol. 5, no. 2, pp.29, 2016.
[9]Raspberry Pi Foundation Raspberry Pi 3 Model B, Available At: https://www.raspberrypi.org/products/raspberry-pi-3-model-b/
[10]D. Borthakur. “HDFS architecture guide.” Hadoop Apache Project 53 (2008).
[11]J. Dean, and S. Ghemawat. “MapReduce: simplified data processing on large clusters.” Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[12]HDFS Ports, Available At: https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/reference_chap2_1.html
[13]yarn-default.xml - Apache™ Hadoop, Available At: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
[14]Apache Hadoop 3.0.0-alpha2 – NodeManager, Available At: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManager.html
[15]Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co, Available At: http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
[16] Bill Gates. The Road Ahead, Available At: https://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwj84JeC7ubUAhUFkZQKHZRrAdQQFgglMAA&url=http%3A%2F%2Fbook.ilkaddimlar.com%2Fd_pdf_book_komputerler_23543.do&usg=AFQjCNHeqF-xABTfC7TdcwvFjPPf677ncw
[17] Internet of Things Architecture, Available At: http://www.csie.ntpu.edu.tw/~yschen/course/2012-1/WNMC/ch14.pdf
[18] V. Patchava, H. B. Kandala, P. R. Babu, “A Smart Home Automation Technique with Raspberry Pi using IoT” Proceedings of International Conference on Smart Sensors and Systems (IC-SSS), pp. 1-4, Dec. 2015
[19] R.Kumar, M.P. Rajasekaran, “An IoT based patient monitoring system using raspberry Pi” Proceedings of International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE), pp. 1-4, Jan. 2016
[20] P.H. Kulkarni, P.D. Kute, V.N. More, “IoT Based Data Processing for Automated Industrial Meter Reader using Raspberry Pi ” Proceedings of International Conference on Internet of Things and Applications (IOTA), pp. 107-111, Jan. 2016
[21] R. Shete, S. Agrawal, “IoT based urban climate monitoring using Raspberry Pi” Proceedings of 2016 International Conference on Communication and Signal Processing (ICCSP), pp. 2008-2012, Apr. 2016
[22] A. Imteaj, T. Rahman, M.K. Hossain and S. Zaman, “IoT based autonomous percipient irrigation system using raspberry Pi” Proceedings of 2016 19th International Conference on Computer and Information Technology (ICCIT), pp. 563-568, Dec. 2016
[23] T. Ahlawat, R. K. Rambola, “Literature Review On Big Data” International Journal of Advancement in Engineering Technology, Management&Applied Science, vol. 3 no. 5, pp. 21-30, May. 2016
[24] V.S. Srinivasan, T. Kumar, D. K. Yasarapu, “Raspberry Pi and iBeacons as environmental data monitors and the potential applications in a growing BigData ecosystem” Proceedings of IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 961-965, May. 2016
[25] A. Bhardwaj, V. K. Singh, Vanraj, Y. Narayan, “Analyzing BigData with Hadoop cluster in HDInsight azure Cloud” Proceedings of India Conference (INDICON), 2015 Annual IEEEC, pp. 1-5, Dec. 2015
[26] Y. Xiao, C. Zhu, “Vehicular fog computing: Vision and challenges” Proceedings of 2017 IEEE International Conference on Pervasive Computing and Communications Work in Progress, pp. 6-9, Mar. 2017
[27] Y. Elkhatib, B. Porter, et al. “On Using Micro-Clouds to Deliver the Fog” IEEE Internet Computing, vol. 21, no. 2, pp. 8-15, Mar. 2017
[28] A. Imteaj, T. Rahman, et al. “An IoT based fire alarming and authentication system for workhouse using Raspberry Pi 3” Proceedings of International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 899-904, Feb. 2017
[29] J. Kiepert, “Creating a Raspberry Pi-Based Beowulf Cluster” Available At: http://coen.boisestate.edu/ece/files/2013/05/Creating.a.Raspberry.Pi-Based.Beowulf.Cluster_v2.pdf
[30] J. Kiepert, “WSNFS: A Distributed Data Sharing System for In-Network Processing” Available At: http://scholarworks.boisestate.edu/cgi/viewcontent.cgi?article=1855&context=td
[31] D-Link DIR-817LW, Available At: https://www.soft4fun.net/product-test-report/d-link-dir-817lw.htm
[32] BUFFALO 8 Port LSW5-GT-8EP/W-TW, Available At: http://tw.buffalo-asia.com/forhome/wired_networking/switch/lsw5-gt-8ep_w-tw/
[33] https://kknews.cc/zh-tw/digital/v5y4kl.html
[34] anidees 6+ USB power supplier, Available At: http://www.books.com.tw/products/N000704475
[35] http://blog.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/
[36] S. J. Cox, J. T. Cox, et al. “Iridis-pi: a low-cost, compact demonstration cluster” Available At: https://www.southampton.ac.uk/~sjc/raspberrypi/raspberry_pi_iridis_lego_supercomputer_paper_cox_Jun2013.pdf
[37] F. P. Tso, D. R. White, et al. “The glasgow raspberry pi cloud: A scale model for cloud computing infrastructures” Proceedings of 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 108-112, 2013
[38] P. Abrahamsson, S. Helmer, et al. “Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment” Proceedings of 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 2, pp. 170-175, Dec. 2013
[39] P. Gohil, D. Garg, et al. “A performance analysis of MapReduce applications on big data in cloud based Hadoop” Proceedings of 2014 International Conference on Information Communication and Embedded Systems (ICICES), pp. 1-6, Feb. 2014
[40] T. White, “Hadoop: The definitive guide” Available At: http://javaarm.com/file/apache/Hadoop/books/Hadoop-The.Definitive.Guide_4.edition_a_Tom.White_April-2015.pdf
[41] Win32 Disk Imager, Available At: https://sourceforge.net/projects/win32diskimager/
[42] DHCP, Available At: https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol
[43] java jdk1.8.0_91, Available At: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
[44] hadoop-2.7.1, Available At: ftp://apache.belnet.be/mirrors/ftp.apache.org/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
[45] protocol buffer 2.5.0, Available At: https://github.com/google/protobuf/releases?after=v2.6.1
[46] parallel processing, Available At: http://www.cc.ntu.edu.tw/chinese/epaper/0012/20100320_1208.htm
[47] Word count, Available At: https://cs.calvin.edu/courses/cs/374/exercises/12/lab/

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊