論文名稱:以樹莓派為基礎的Hadoop Cluster之執行時間改進方法
論文名稱(外文):Methods of Execution Time Improvement with Raspberry Pi Based Hadoop Cluster
外文關鍵詞:IoTBig DataRaspyberry PiClusterHadoop
物聯網(Internet of Things)的興起,成為未來網路通訊科技重要的發展趨勢。因此物聯網相關的研究與應用也越來越受到重視。由於物聯網裝置的數量龐大,會以相當快的速度產生相當大量的資料,也就是大數據(Big data)。因此,如何快速有效率的去分析、處理並應用這些大數據是相當重要的挑戰。這些大量資料通常需要及時的處理(Real-time processing),然而,如果透過傳統的雲端運算(Cloud Computing),將資料傳送給遙遠的雲端處理,再將處理好的資料回傳,會耗費較多的時間。此外,使用者要連上遠端的大型資料中心才能存取服務,如果所有人都共用一個資料中心,或產生資料的節點太多,會因為佔用大量網路頻寬,造成資料中心的負擔過重。因此出現了功能相似,但運算能力較弱的霧運算(Fog computing)。資料處理則是更接近用戶終端裝置,這麼做的原因是可以化解可能出現的網路塞車現象,有效減輕網路流量,資料中心的運算負荷也跟著減輕。在本論文中,以九台低功耗的微型控制器,樹莓派三代B版(Raspberry Pi 3 Model B),架設了一個小型運算叢集(Cluster)。此叢集使用了阿帕契的分散式檔案系統(Apache HDFS: Apache Hadoop Distributed File System),並改善樹莓派叢集執行MapReduce的執行時間。此外,本文透過四種方法以及三種Master-Slave Model,分別對照五種檔案大小來做對照觀察比較。由於在預設環境下執行MapRedce的第一種方法無法有較短的執行時間,因此進一步提出另外三種方法,第二種方法為修改磁碟使用率參數,第三種方法為在每個節點上設定了暫存remote resources的位置,第四種方法為第二、第三種方法的組合,可以減少最多MapReduce執行時間。透過以上方法,可以達到改善樹莓派叢集執行MapReduce時間過長的問題。
In recent years, the rapid development of computer and network technology leads to the emergence of Internet of Things (IoT). The development of IoT will become the important trend in the future. The research and applications related to IoT attract extensive attention. Due to the large number of Internet of Things devices, large volume of data are generated at a high speed. To process these data efficiently is an important challenge. These data usually requires real-time processing. However, if use traditional cloud computing, the data are sent to the remote cloud for processing and wait the processed data back. It will spend more time. Otherwise, the users have to connect the remote centralized data center to access the service. When many users connect the data center at the same time, it needs network bandwith, and leads the overhead to the data center. Therefore, to remedy this problem, the concept of fog computing has been proposed. With fog compting, data processing is closer to the user’s terminal device, so it can release network traffic congestion. That effectively reduces network traffic, and computing load of data center also be improved. In this thesis, nine low-power micro-controller are set up as a small cluster. Also, we use the Apache Hadoop Distributed File System (Apache HDFS) for our experiments, we test MapReduce execution time of four methods and three Master-Slave Model in five file sizes situation. Since the MapReduce execution time of first method is too long, the other three methods are proposed. The second method modifies the disk utilization parameter, and reduces MapReduce execution time. The third method sets the temporary storage for remote resources. The fourth method combines second method and third method. Through these proposed methods, the MapReduce Execution time can be improved.
