跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.223) 您好!臺灣時間:2025/10/08 08:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡允哲
研究生(外文):Yun-Che Tsai
論文名稱:多重巨量資料處理平台之整合與最佳化技術
論文名稱(外文):Integration and Optimization Technologies for Multiple Big Data Processing Platforms
指導教授:張保榮
指導教授(外文):Bao-Rong Chang
學位類別:碩士
校院名稱:國立高雄大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:63
中文關鍵詞:分散式記憶體儲存雲端計算多重巨量資料處理平台分散式檔案系統資料倉儲
外文關鍵詞:distributed memory storagemultiple big data processing platformdata warehousecloud computingdistributed file system
相關次數:
  • 被引用被引用:0
  • 點閱點閱:362
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
本研究的目的在基於雲端計算架構上的建置一套具有高效能、高可用性、高擴展性的多重巨量資料處理平台,透過整合Apache Hive、Cloudera Impala及BDAS Shark使平台在巨量資料的環境下支援SQL命令快速檢索能力。首先,透過本研究所設計的最佳化程式,可以讓使用者透過單一的存取介面後,由程式自動選擇執行效能最佳的巨量資料倉儲平台進行運算。再者,利用Memcached分散式記憶體儲存系統和Apache Hadoop中的HDFS分散式檔案系統對已查詢結果進行快取,此後若是輸入相同的SQL查詢命令則會透過此高效能的快取系統直接取得檢索結果,避免巨量資料倉儲平台重複執行相同命令所導致的冗長檢索時間。透過上述兩項機制可使整體效能有顯著性的提升,尤其對於多人使用環境下執行重複性高的SQL命令,更能大幅縮短檢索所需的時間。
The objective of this study is to realize a multiple big data processing platform with high performance and high availability. The integration of Apache Hive, Cloudera Impala, and BDAS Shark make the platform support SQL query in big data environment. In addition, users can access a single interface and select the best performance of big data warehouse platform automatically by the optimizer proposed in this research. Distributed memory storage system Memcached along with distributed file system Apache Hadoop HDFS is employed for caching query results. Thereafter, if user gives the same SQL query command, user is able to get the same result rapidly from the high-performance cache system so as to avoid a longer retrieval time when suffering the repeated searches in big data warehouse platform. The proposed approach definitely improves the overall performance significantly, and especially the application of the high repeatable SQL commands with multi-user mode makes it possible to reduce the time for query/response dramatically.
摘要 ii
ABSTRACT iii
誌謝 iv
Directory v
List of Figures vi
List of Tables viii
Chapter 1. Introduction 1
Chapter 2. Background and Related Work 6
2.1 Apache Hadoop 7
2.2 Apache Hive 9
2.3 Cloudera Impala 11
2.4 BDAS Spark/Shark 13
2.5 Hue 14
2.6 Memcached 17
Chapter 3. Research Method 18
3.1 Deployment of virtualized server environment 18
3.2 Integration of multiple big data analysis platform 22
3.3 Mechanism of the automatic platform selector 23
3.3.1 Research of the critical point 24
3.3.2 Getting the memory information from other servers 31
3.4 Development of caching mechanism 32
3.4.1 In-memory cache design 33
3.4.2 In-disk cache design 37
3.5 Program structure and flow of execution 38
Chapter 4. Experimental Results and Discussion 42
4.1 Experimental environment 42
4.2 The experimental results 44
4.2.1 Test environment I 44
4.2.2 Test environment II 46
4.2.3 Test environment III 48
Chapter 5. Conclusion 51
References 53
[1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, Vol. 51, No. 1, pp. 107-113, 2008.
[2] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. “Bigtable: A Distributed Storage System for Structured Data,” In proceedings of Operating Systems Design and Implementation (OSDI), pp. 205-218. , 2006.
[3] S. Ghemawat, H. Gobioff, and S. T. Leung, “The Google File System,” ACM SIGOPS Operating Systems Review - SOSP '03 , Vol. 37, No. 5, pp. 29-43, 2003.
[4] M. Meeteren, “Mapping Communities in Large Virtual Social Networks: Using Twitter Data to Find the Indie Mac Community,” In proceedings of IEEE International Workshop on Business Applications of Social Network Analysis, pp. 1-8, 2010.
[5] A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu , R. Murthy, “Hive - A Petabyte Scale Data Warehouse using Hadoop,” In proceedings of ICDE, pp. 996-1005, 2010.
[6] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, D. Lewin, ”Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web,” In proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, pp. 654-663. , 1997.
[7] B. R. Chang, H. F. Tsai, and Y. C. Tsai, “High-Performed Virtualization Services for In-Cloud Enterprise Resource Planning System,” Journal of Information Hiding and Multimedia Signal Processing, Vol. 5, No. 4, pp. 609-624, 2014.
[8] M. Samovsky, “Cloud-Based Classification of Text Documents using the Gridgain Platform,” In proceedings of 7th IEEE International Symposium on Applied Computational Intelligence and Informatics, pp. 241-245, 2012.
[9] A. Kala Karun, K. Chitharanjan, “A Review on Hadoop — HDFS Infrastructure Extensions,” In proceedings of 2013 IEEE Conference on Information & Communication Technologies, pp. 132-137, 2013.
[10] M. Stonebraker, "SQL Databases v. NoSQL Databases", Commun. ACM, Vol. 53, pp. 10-11, 2010.
[11] G. DeCandia et al., “Dynamo: Amazon's Highly Available Key-Value Store,” In proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pp. 205-220, 2007.
[12] A. Agarwal, M. Slee, and M. Kwiatkowski, “Thrift: Scalable Cross-Language Services Implementation,” Facebook, Tech. Rep., 2007.
[13] Apache Hadoop - http://hadoop.apache.org/
[14] Apache Hive - http://hive.apache.org/
[15] Cloudera Impala - http://impala.io/
[16] Apache Spark - https://spark.apache.org/
[17] BDAS shark - http://shark.cs.berkeley.edu/
[18] Hue - http://gethue.com/
[19] Proxmox Virtual Environment - https://pve.proxmox.com/
[20] Memcached - http://memcached.org/
[21] IEEE P802.3ad Link Aggregation Task Force - http://grouper.ieee.org/groups/802/3/ad/index.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊