(3.80.55.37) 您好!臺灣時間:2019/01/17 04:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
本論文永久網址: 
line
研究生:葉佩峰
研究生(外文):YEH, JOBA
論文名稱:巨量資料分析平台與服務之研究
論文名稱(外文):Research on Big Data Analysis Platform and Services
指導教授:林芳苓
指導教授(外文):LIN, FANG-LING
口試委員:王貞雅林國平林芳苓
口試委員(外文):WANG, CHEN-YALIN, KUO-PINGLIN, FANG-LING
口試日期:2018-07-04
學位類別:碩士
校院名稱:龍華科技大學
系所名稱:資訊管理系碩士班
學門:電算機學門
學類:電算機一般學類
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:85
中文關鍵詞:ClouderaHadoopSparkVMware ESXiMKSbackupFreeNAS
外文關鍵詞:ClouderaHadoopSparkVMware ESXiMKSbackupFreeNAS
相關次數:
  • 被引用被引用:0
  • 點閱點閱:435
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:15
  • 收藏至我的研究室書目清單書目收藏:0
隨著網路科技蓬勃發展、物聯網的興起,企業累積龐大且多元型式的資料。傳統的數據分析框架無法處理如此大量的資料,如何面對巨量資料並在商業活動中創造高附加價值,是各企業近幾年努力的方向。與國外相比,巨量資料分析平台在台灣教學與學術研究文件較少,本論文希望可以在巨量資料分析平台的研究上做一些努力。
Hadoop自2005年由Apache基金會引入成為一項子項目開始,打開了一扇巨量資料研究大門,眾多發行版如雨後春筍般的分支發展,其中以商業模式運行著名並提供高相容性與穩定性的發行版非Cloudera莫屬。本研究以Cloudera為基礎,探討Hadoop發行版以及對應的虛擬化與備份策略,並針對用途發展分為兩個部分。
第一部分探討適用於教學與個人使用的虛擬機開放格式文件(.OVF),第二部分則是以龍華科技大學為例,探討在有限的資源下如何透過VMware ESXI搭建叢集,並且透過MKSBackup實現穩健的備份策略。
本研究結果可以成為大多數在預算與成本間擺盪不定的中小企業、SOHO族,甚至是資源匱乏的私校做為參考實例,讓自建環境不再困難,以提升台灣巨量資料分析平台技術水平。
With the development of Internet technology and the innovation of the Internet of Things, businesses have accumulated huge amounts of data onto different types. Traditional data processing techniques are insufficient to handle such increasingly diverse data. Confronting the huge amounts of data and creating high added value in business activities from whom are the new challenges to many companies in recent years. Compared with foreign countries, there were fewer researches of teaching cases and academic studies about the big data analysis platform in Taiwan. This thesis has made some efforts in the research of big data analysis platform.
Hadoop, being started as a subproject by the Apache Foundation in 2005, has opened the door for big data techniques research. Among the various branches of commercial distributions, the release that is known to operate in a business model and provides high compatibility and stability is Cloudera. Based on Cloudera, this study explores the Hadoop techniques and the corresponding virtualization and backup strategies, and is divided into two parts in the applications.
In the first part of the application, this study explores the construction of an open virtualized format (.OVF) for teaching and personal use. In the second part of the application, taking the Lunghwa University of Science and Technology as a study case to explore the strategy of building a powerful backup cluster using VMware ESXI and MKSBackup with limited resources.
This study can serve as a reference site for most SMEs, SOHO groups or colleges with scarce resources. The research results make the self-established big data platform easy to implement and elevate the technical level of big data analysis platform.

摘要 i
ABSTRACT ii
誌謝 iv
目錄 v
表目錄 viii
圖目錄 ix
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
1.4 研究流程 3
第二章 文獻探討 5
2.1 Hadoop 5
2.1.1 HDFS 6
2.1.2 MapReduce 8
2.1.3 SPARK 9
2.2 Hadoop發行版 10
2.2.1 MapR 10
2.2.2 Cloudera 11
2.2.3 Hortonworks 13
2.3 Hadoop發行版 14
2.3.1 Ubuntu 14
2.3.2 CentOS 15
2.4 虛擬化 16
2.4.1 虛擬化技術分類 18
2.4.2 指令集架構等級的虛擬化 18
2.4.3 硬體抽象層等級的虛擬化 19
2.4.4 作業系統等級的虛擬化 21
2.4.5 程式語言等級的虛擬化 22
2.4.6 函式庫等級的虛擬化 22
2.5 虛擬化軟體 23
2.5.1 Citrix 23
2.5.2 VMware 24
2.5.3 VirtualBox 25
2.5.4 Microsoft Hyper-V 26
2.6 防火牆 27
2.6.1 Fortinet 28
2.6.2 Check Point 29
2.6.3 Palo Alto Networks 29
2.7 儲存裝置 30
2.7.1 FreeNAS 32
第三章 系統規劃與設計 33
3.1 方案評估 34
3.1.1 Hadoop發行版評估 34
3.1.2 個人端虛擬化軟體評估 34
3.1.3 伺服器端虛擬化軟體評估 35
3.1.4 防火牆評估 36
3.1.5 儲存裝置評估 36
3.2 方案選擇 37
3.2.1 Hadoop發行版選擇 37
3.2.2 個人端虛擬化軟體選擇 37
3.2.3 伺服器端虛擬化軟體選擇 38
3.2.4 防火牆選擇 38
3.2.5 儲存裝置選擇 39
第四章 系統實做 40
4.1 Cloudera安裝計劃 40
4.2 Cloudera 安裝計劃選擇 41
4.3 開放虛擬機格式文件 42
4.3.1 虛擬機規格 42
4.3.2 虛擬機安裝流程 46
4.4 Cloudera之虛擬化叢集解決方案 47
4.4.1 虛擬機規劃 48
4.4.2 虛擬化超額分配 50
4.4.3 虛擬機安裝流程 51
4.4.4 硬碟規劃 52
4.4.5 備份規劃 54
4.4.6 防火牆與網路規劃 58
第五章 系統比較與分析 60
5.1 系統比較 60
5.1.1 時間成本 60
5.1.2 叢集成本 63
5.2 系統分析 64
5.2.1 叢集水平延伸與限制 64
5.2.2 Yarn處理器分配與規劃 65
第六章 結論與未來研究方向 67
6.1 研究限制 67
6.2 未來研究方向 68
6.2.1 作業系統重新封裝 68
6.2.2 糾刪碼與SSM搭配之效能研究 69
6.2.3 異質平台之Hadoop效能測試標準 70
參考文獻 71
附錄 77


書籍
1.資訊服務產業年鑑編纂小組(2012),2012資訊服務產業年鑑,台北︰資策會產業情報研究所,第41頁。
2.林大貴(2016),Python+Spark 2.0+Hadoop機器學習與大數據分析實戰,台北︰博碩文化股份有限公司,第6頁。
3.洪瑞展(2015),實戰CentOS 7作業系統,碁峰資訊股份有限公司,第1-5頁。
4.熊信彰(2010),打造雲端作業系統--VMware vSphere 4建置入門,台北︰碁峰資訊,第6章。
5.T. White(2011), Hadoop: The Definitive Guide 2nd ed., Oreilly Media, pp.9.
6.T. White(2011), Hadoop: The Definitive Guide 2nd ed., Oreilly Media, pp.15.

期刊論文
7.N. Geddes(2012), "The Large Hadron Collider and grid computing", Phil. Trans. R. Soc. A 2012, 370: 965–77.1.
8.B. Javadi, et al. (2011), "Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of Seti@home", IEEE Transactions Parallel and Distributed Systems, vol. 22, no. 11, pp. 1896-1903.
9.Fay Chang, et al. (2006), "Bigtable: A Distributed Storage System for Structured Data", ACM Transactions on Computer Systems (TOCS), vol.26, no.2, pp.1–26, 2008.
10.M. Zaharia, et al. (2016), "Apache Spark: A unified engine for big data processing", Communications of the ACM 59(11): 56–65.
11.Susanta Nanda and Tzi-cker Chiueh (2005), "A Survey on Virtualization Technologies", Technical Report (TR179), Department of Computer Science, SUNY at Stony Brook. NY 2005.

會議論文
12.Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (2003), "The Google File System", In 19th Symposium on Operating Systems Principles, pages 29–43, Lake George, New York.
13.Jeffrey Dean and Sanjay Ghemawat (2004), "MapReduce: Simplified Data Processing on Large Clusters", In sixth Symposium on Opearting Systems Design & Implementation(OSDI'04), vol. 37.
14.Chien-Heng Wu, et al, (2016), "Big data development platform for engineering applications", In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, pages 2699–2702, 2017.

電子文獻
15.魏佳卉(2016),十年後,你的大學母校還在嗎?,大學問,取自:
https://www.unews.com.tw/News/Info/965。
16.iThome(2015),2015年CIO大調查(中),iThome,取自:
https://www.ithome.com.tw/article/94143。
17.Microsoft(2017),Windows 10 上的 Hyper-V 簡介,Microsoft,取自:
https://docs.microsoft.com/zh-tw/virtualization/hyper-v-on-windows/about/。
18.iThome(2017),2017Q1 IDC資訊安全設備追蹤季報,iThome,取自:
https://www.ithome.com.tw/pr/115002。
19.中國存儲網(2016), DAS、NAS、SAN三种存储架构比较及应用分析, 取自:http://www.chinastor.com/a/jishu/SAN/0222203942016.html。
20.Tim Smith(2013), Big Data, TED-Ed Lessons, from
https://ed.ted.com/lessons/exploration-on-the-big-data-frontier-tim-smith.

網路網頁
21.王耀聰(2016),Hadoop生態系十年回顧與未來發展,Big Data Innovation Summit 2016,iThome,取自:
https://s.itho.me/bigdata/2016/day1/16-12-05_Hadoop.TW_Now_and_Future.pdf。
22.Apache(2018), HDFS Architecture Guide, from
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
23.Cloudera(2014), The Truth About MapReduce Performance on SSDs, from
https://blog.cloudera.com/blog/2014/03/the-truth-about-mapreduce-performance-on-ssds/.
24.Apache(2018), Quick Start, from
https://spark.apache.org/docs/latest/quick-start.html.
25.Apache(2018), Apache License, from
http://www.apache.org/licenses/LICENSE-2.0.
26.Cloudera(2016), Big Data Hadoop Distributions, Q1 2016, from
https://www.cloudera.com/content/dam/www/marketing/resources/analyst-reports/the-forrester-wave-big-distributions.pdf.landing.html.
27.MapR(2015), MapR Overview, from
http://doc.mapr.com/display/MapR/MapR+Overview.
28.Cloudera(2018), About us, from https://www.cloudera.com/more/about.html.
29.Cloudera(2011), Cloudera’s Distribution Including Apache Hadoop 3 Update 3 (CDH) Datasheet, from
https://www.cloudera.com/content/dam/www/marketing/resources/datasheets/clouderas-distribution-including-apache-hadoop-3-update-3-cdh-datasheet.pdf.landing.html.
30.Cloudera(2018), Cloudera Installation Guide, from
https://www.cloudera.com/documentation/enterprise/latest/topics/installation.html.
31.Cloudera(2018), Cloudera Enterprise: The Ultimate Data Engine, from
https://www.cloudera.com/content/dam/www/marketing/resources/datasheets/cloudera-enterprise-datasheet.pdf.landing.html.
32.Cloudera(2018), Cloudera Manager 5 Overview, from
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html.
33.Hortonworks (2018), Hortonworks, from https://hortonworks.com/.
34.Ubuntu(2018), Ubuntu, from https://www.ubuntu.com/.
35.Centos(2018), Centos, from https://www.centos.org/.
36.Citrix(2018), Citrix, from https://www.citrix.com.tw/.
37.Citrix(2018), Citrix XenServer ® 7.5 快速入门指南, from
https://docs.citrix.com/zh-cn/xenserver/current-release/downloads/quick-start-guide.pdf.
38.VMware(2018), VMware, from https://www.vmware.com.
39.Magikmon(2018), MKSBackup overview, from
http://www.magikmon.com/mksbackup/index.en.html.
40.VirtualBox(2018), VirtualBox, from https://www.virtualbox.org.
41.Microsoft(2018), Hyper-V Architecture, from
https://docs.microsoft.com/zh-tw/virtualization/hyper-v-on-windows/reference/hyper-v-architecture.
42.Fortinet(2018), Fortinet, from https://www.fortinet.com/.
43.Check Point(2018), Check Point, from https://www.checkpoint.com/.
44.Check Point(2015), R77 Versions Administration Guide, from
https://downloads.checkpoint.com/dc/download.htm?ID=24832.
45.Palo Alto(2018), Palo Alto, from https://www.paloaltonetworks.com/.
46.Palo Alto(2018), PAN-OS® 8.0 Administrator’s Guide, from
https://www.paloaltonetworks.com/content/dam/pan/en_US/assets/pdf/technical-documentation/80/pan-os/pan-os-admin/pan-os.pdf.
47.FreeNAS(2018), FreeNAS, from http://www.freenas.org.
48.VMware(2018), VMware Compatibility Guides, from
https://www.vmware.com/resources/compatibility/search.php.
49.Microsoft(2018), Windows Server Backup Feature Overview, from
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/jj614621(v=ws.11) .
50.FreeNAS(2018), FreeNAS vs Openfiler, from
http://www.freenas.org/freenas-vs-openfiler/.
51.FreeNAS(2018), FreeNAS vs NAS4Free, from
http://www.freenas.org/freenas-vs-nas4free/.
52.FreeNAS(2018), 29. VAAI, from https://doc.freenas.org/11/vaai.html.
53.VMware(2012), VAAI, from
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-storage-api-array-integration-white-paper.pdf.
54.Cloudera(2018), Installing Cloudera Manager and CDH, from
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/installation_installation.html.
55.VMware(2018), Understanding VM snapshots in ESXi/ESX(1015180), from
https://kb.vmware.com/s/article/1015180.
56.Apache(2018), Setting up a Single Node Cluster, from
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html.
57.Apache(2018), Spark Standalone Mode, from
https://spark.apache.org/docs/latest/spark-standalone.html.
58.VMware(2018), VMware vSphere Essentials Kits, from https://store.vmware.com.
59.Apache(2018), Define a CPU resource(s) unambigiously, from
https://issues.apache.org/jira/browse/YARN-1024.
60.pache(2018), Document the meaning of a virtual core, from
https://issues.apache.org/jira/browse/YARN-976.
61.Apache(2018), Apache Hadoop Main 2.6.0 API, from
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/yarn/api/records/Resource.html.
62.VMware(2007), VMware Virtual SMP, from
https://www.vmware.com/pdf/vsmp_datasheet.pdf.
63.Apache(2017), Apache Hadoop 3.0.0 Overview, from
http://hadoop.apache.org/docs/r3.0.0/index.html.
64.Microsoft(2018), Sysprep (Generalize) a Windows installation, from
https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/sysprep--generalize--a-windows-installation.
65.Apache(2018), HDFS smart storage management, from
https://issues.apache.org/jira/browse/HDFS-7343.
66.Hortonworks(2016), Big Data Hadoop Cloud Solutions, Q2 2016, from
https://hortonworks.com/press-analyst/big-data-hadoop-cloud-solutions/.
67.Microsoft(2018), Use Azure storage with Azure HDInsight clusters, from
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔