跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.102) 您好!臺灣時間:2025/12/04 02:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:羅祥福
研究生(外文):LO,HSIANG-FU
論文名稱:啟發式雲端平台自我效能優化機制之研究
論文名稱(外文):Study of Performance Optimization Scheme for Hadoop MapReduce Architecture
指導教授:劉江龍劉江龍引用關係劉豐豪張克勤張克勤引用關係
指導教授(外文):LIU,CHIANG-LUNGLIU,FONG-HAOCHANG,KO-CHIN
口試委員:吳嘉龍洪敏雄黃冠寰詹前隆董德國郝樹聲
口試日期:2016-07-07
學位類別:博士
校院名稱:國防大學理工學院
系所名稱:國防科學研究所
學門:軍警國防安全學門
學類:軍事學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:79
中文關鍵詞:Hadoop參數優化效能調校蟻群最佳化演算法基因表達規劃法
外文關鍵詞:Hadoop Configuration OptimizationPerformance TuningAnt Colony Optimization (ACO)Gene Expression Programming (GEP)
相關次數:
  • 被引用被引用:1
  • 點閱點閱:188
  • 評分評分:
  • 下載下載:33
  • 收藏至我的研究室書目清單書目收藏:0
隨著雲端技術廣泛運用的同時,巨量資料(Big Data)也持續遽增,巨量資料的資料運算效能已成為一項重要的研究議題。本論文主要探討Hadoop平台的效能量測方法以及Hadoop平台效能調校問題,並提出相對應的改善方法。
為改善現有MapReduce資訊隱藏應用程式缺乏效能量測的問題,本論文提出MapReduce資訊隱藏效能量測模型(Performance AnalysiS Scheme for MapReduce Information Hiding, PASS-MIH),能夠針對Hadoop資訊隱藏分析效能影響因素,實驗結果說明PASS-MIH效能量測模型能夠提供MR-based LSB個案四個效能影響層面的分析與量測,並結合現有Hadoop平台優化方法,效能改善率可達53.8%;此外,為針對PASS-MIH架構多層面的效能影響因素,過濾找出影響Hadoop平台效能的重要參數,以量測調整重要參數的效能結果,本論文提出整合式效能量測(Comprehensive Performance Rating, CPR)模型,採用主成份分析法,過濾出9個Hadoop重要參數,實驗結果說明調整Hadoop重要參數會對效能產生非線性的影響,並可利用Hadoop重要參數指引Hadoop效能調校。
為滿足Hadoop參數自動調效(Auto-Tuning)的迫切需求,本論文提出「基於蟻群演算法Hadoop平台效能優化機制」,能夠避免產生工作特徵收集的額外負載,採用蟻群最佳化演算法(Ant Colony Optimization, ACO),並結合基因表達規劃法(Gene Expression Programming, GEP),從歷史Hadoop工作紀錄探勘Hadoop重要參數與效能關聯模型以做為ACO選擇路徑的啟發資訊,自動化搜尋Hadoop優化參數,以強化Hadoop平台效能,實驗結果顯示,本論文所提ACO-HCO機制與目前極具代表性的Hadoop參數優化方法,包含Starfish機制與業界經驗法則(RoT),能夠提供較好的執行效能。

As the use of cloud computing increases rapidly, Big Data also continue to grow quickly. The performance of data processing for big data has become an important research issue. This thesis discusses performance measurement methods together with performance tuning scheme in Hadoop MapReduce and then correspondingly proposes the performance improvement methods.
To design a performance measurement scheme for Hadoop information hiding applications, a Performance AnalysiS Scheme for MapReduce Information Hiding (PASS-MIH) model is proposed to analyze and measure the performance impact factors of Hadoop information hiding applications. Experimental results show that PASS-MIH model can estimate four levels of performance impact factors for MR-based LSB test case and gain 53.8% performance improvement rate while integrating an existing Hadoop parameter tuning method. In addition, a Comprehensive Performance Rating (CPR) model was used to identify nine principal components from workload history and Hadoop configuration that strongly impacted the Hadoop performance. Experimental results indicate that tuning principal components of Hadoop configurations can produce non-linear performance results.
In addition, an ACO-based Hadoop Configuration Optimization (ACO-HCO) scheme is proposed to optimize the performance of Hadoop by automatically tuning its configuration parameter settings. ACO-HCO first employed gene expression programming technique to build an object function based on historical job running records, which represents a correlation among the Hadoop configuration parameters. It then employs ant colony optimization technique, which makes use of the objective function to search for optimal or near optimal parameter settings. Experimental results verify that ACO-HCO scheme enhances the performance of Hadoop significantly compared with the default settings. Moreover, it outperforms both rule-of-thumb settings and the Starfish model in Hadoop performance optimization.

目錄
致謝 ii
摘要 iii
Abstract iv
目錄 vi
圖目錄 viii
表目錄 ix
縮寫中英文對照表 x
1. 緒論 1
1.1. 研究背景 1
1.2. 研究動機與目的 3
1.3. 論文架構 5
2. 文獻探討 7
2.1. MapReduce雲端運算架構與Hadoop平台 7
2.2. Hadoop平台設定參數 11
2.3. 基因表達規劃法 15
2.4. Hadoop平台參數最佳化方法 18
3. Hadoop平台效能量測機制 26
3.1. 前言 26
3.2. PASS-MIH效能量測模型 26
3.2.1 PASS-MIH量測架構 26
3.2.2 實驗環境與設定 31
3.2.3 實驗結果與分析 33
3.3. 整合式效能量測模型 35
3.3.1 CPR量測架構 35
3.3.2 CPR Hadoop參數主成份與整合式效能模型 36
3.3.3 實驗環境與設定 39
3.3.4 實驗結果與分析 40
3.4. 小結 45
4. ACO-HCO效能優化機制 47
4.1. 前言 47
4.2. ACO-HCO效能優化機制整體架構 47
4.3. ACO-HCO特徵分析 48
4.4. ACO-HCO參數關聯塑模 49
4.5. ACO-HCO參數最佳化 54
4.6. 實驗環境與結果分析 58
4.6.1 實驗環境與設定 59
4.6.2 實驗結果與分析 64
4.7. 小結 67
5. 結論與未來研究方向 69
5.1. 結論 69
5.2. 未來研究方向 70
參考文獻 72
論文發表 77
自傳 79 
圖目錄
圖2.1 MapReduce運作架構圖[5] 8
圖2.2 Hadoop平台架構及角色指派實例圖 9
圖2.3 GEP染色體與表示樹結構範例 16
圖2.4 Hadoop參數間相關性GEP挖掘範例 17
圖3.1 MR-based LSB Hadoop程式執行過程 32
圖3.2 MR-based LSB應用程式執行時間分析結果 33
圖3.3 CPR量測架構圖 35
圖3.4 CPR指引調整輸入資料量,量測不同叢集執行時間變化情形 41
圖3.5 CPR指引調整叢集規模,量測Map與Reduce任務執行時間比例 41
圖3.6 CPR指引調整Map數量,量測實際時間與模型估算時間變化趨勢 42
圖3.7 調整Map任務輸出記憶體暫存區(io.sort.mb)量測結果 43
圖3.8 調整Reduce任務數量量測結果 44
圖3.9 調整多個重要變數量測結果 45
圖4.1 ACO-HCO效能優化機制整體架構圖 48
圖4.2 Hadoop參數GEP關聯模型建立操作介面 60
圖4.3 ACO-HCO參數最佳化開發工具 61
圖4.4 ACOHadoop輸出Hadoop最佳化參數結果 62
圖4.5 ACO-HCO機制WordCount程式執行效能比較 65
圖4.6 ACO-HCO機制Sort程式執行效能比較 66
圖4.7 ACO-HCO機制MR-based LSB程式執行效能比較 67


表目錄
表2.1 Hadoop部分參數摘要表 14
表2.2 Hadoop平台參數最佳化方法比較 24
表3.1 MapRecue資訊隱藏效能量測Hadoop實驗環境硬體規格 31
表3.2 MR-based LSB應用程式效能改善點量測結果 34
表3.3 CPR量測架構主成份分析Hadoop 重要參數表 37
表3.4 HDFS效能模型符號說明表 38
表3.5 MapReduce效能模型符號說明表 39
表3.6 CPR效能量測Hadoop實驗環境硬體規格 40
表4.1 GEP採用的Hadoop重要參數表 50
表4.2 GEP建立 Hadoop與效能關聯模型使用的數學函數 51
表4.3 GEP Hadoop與效能關聯模型訓練資料部分樣本 52
表4.4 Hadoop與效能關聯模型GEP演算法實作 53
表4.5 ACO-HCO所使用的Hadoop平台參數設定值 55
表4.6 Hadoop實驗環境硬體規格 59
表4.7 ACO-HCO機制建議最佳化參數 62
表4.8業界經驗法則(RoT)建議最佳化參數 63
表4.9 Starfish機制建議WordCount最佳化參數 63
表4.10 Starfish機制建議Sort最佳化參數 64


[1]Guan, L., Ke, X., Song, M., and Song, J., “A Survey of Research on Mobile Cloud Computing,” Proceedings of the 2011 IEEE/ACIS 10th International Conference on Computer and Information Science (ICIS), pp. 387-392, 2011.
[2]N. US Department of Commerce, “NIST Manuscript Publication Search.” http://www.nist.gov/manuscript-publication-search.cfm?pub_id=909616. [Accessed online at 6/30/2016].
[3]EMC, “Extracting Value from Chaos,” http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf [Accessed online at 6/30/2016]
[4]Zikopoulos, P. and Eaton, C., Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw-Hill Osborne Media, pp. 3-15, 2011.
[5]Jeffrey, D. and Sanjay, G., “MapReduce: Simplified Data Processing on Large Clusters,” Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation, San Francisco, CA, pp. 137-150, 2004.
[6]Ghemawat, S., Gobioff, H., and Leung, S.T., “The Google File System,” Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, Bolton Landing, NY, USA, pp. 29-43, 2003.
[7]Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E., “Bigtable: A Distributed Storage System for Structured Data,” Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Seattle, Washington, pp. 205-218, 2006.
[8]Hadoop MapReduce, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html [Accessed online at 6/30/2016]
[9]Babu, S., “Towards Automatic Optimization of MapReduce Programs,” Proceedings of the 1st ACM Symposium on Cloud Computing, New York, NY, USA, pp. 137-142, 2010.
[10]Yu, Z., Thomborson, C., Wang, C., Wang, J., and Li, R., “A Cloud-based Watermarking Method for Health Data Security,” Proceedings of the 2012 International Conference on High Performance Computing and Simulation (HPCS), pp. 642-647, 2012.
[11]Murakami, K., Hanyu, R., Zhao, Q., and Kaneda, Y., “Improvement of Security in Cloud Systems Based on Steganography,” Proceedings of the 2013 International Joint Conference on Awareness Science and Technology and Ubi-Media Computing (iCAST-UMEDIA), pp. 503-508, 2013.
[12]Yu, Z., Wang, C., Thomborson, C., Wang, J., Lian, S., and Vasilakos, A. V., “A Novel Watermarking Method for Software Protection in the Cloud,” Software Practice and Experience, Vol. 42, No. 4, pp. 409-430, Apr. 2012.
[13]Abbasy, M. R. and Shanmugam, B., “Enabling Data Hiding for Resource Sharing in Cloud Computing Environments Based on DNA Sequences,” Proceedings of the 2011 IEEE World Congress on Services (SERVICES), pp. 385-390, 2011.
[14]Huang, F., Zhao, H., Li, B., and Lv, Z., “Watermarking Massive Remote Sensor Images in Parallel,” Proceedings of the 2010 International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1-4, 2010.
[15]Yang, C.T., Shih, W.C., Chen, G.H., and Yu, S.C., “Implementation of A Cloud Computing Environment for Hiding Huge Amounts of Data,” Proceedings of the 2010 International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 1-7, 2010.
[16]Yang, C.T., Lin, C.H., and Chang, G.L., “Implementation of Image Watermarking Processes on Cloud Computing Environments,” Proceedings of the Security-Enriched Urban Computing and Smart Grid, pp. 131-140, 2011.
[17]Cloudera Developer Blog, “7 Tips for Improving MapReduce Performance,” http://blog.cloudera.com/blog/2009/12/7-tips-for-improving-MapReduce-performance/. [Accessed online at 6/16/2015].
[18]Intel, “Optimizing Apache Hadoop Deployments,” http://www.intel.com/content/www/us/en/cloud-computing/cloud-computing-optimizing-hadoop-deployments-paper.html. [Accessed online at 6/16/2015].
[19]Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., and Babu, S., “Starfish: A Self-tuning System for Big Data Analytics,” Proceedings of the 5th Conference on Innovative Data Systems Research (CIDR '11), pp. 261-272, 2011.
[20]Herodotou, H. and Babu, S., “Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs,” Proceedings of the PVLDB, Vol. 4, No.11, pp. 1111-1122, 2011.
[21]Lim, H., Herodotou, H., and Babu, S., “Stubby: A Transformation-based Optimizer for MapReduce Workflows,” Technical Report, Duke Computer Science, http://www.cs.duke.edu/Starfish/files/stubby-technical-report.pdf. [Accessed online at 6/30/2016].
[22]Wu, D. and Gokhale, A., “A Self-tuning System Based on Application Profiling and Performance Analysis for Optimizing Hadoop MapReduce Cluster Configuration,” Proceedings of the 20th Annual International Conference on High Performance Computing, pp. 89-98, 2013.
[23]Liao, G., Datta, K., and Willke, T. L., “Gunther: Search-based Auto-tuning of MapReduce,” Proceedings of the 19th International Conference on Parallel Processing, Berlin, Heidelberg, pp. 406-419, 2013.
[24]Google App Engine, https://developers.google.com/appengine [Accessed online at 6/30/2016]
[25]Microsoft Wineows Azure, http://www.windowsazure.com [Accessed online at 6/30/2016]
[26]Apache Hadoop, http://hadoop.apache.org [Accessed online at 6/30/2016]
[27]Ferreira, C., “Gene Expression Programming: A New Adaptive Algorithm for Solving Problem,” Complex System, Vol.13, pp.87-129, 2001.
[28]Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning, 1st ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1989.
[29]Glover, F., “Tabu Search - Part I,” ORSA Journal on Computing, Vol. 1, No. 3, pp. 190-206, Aug. 1989.
[30]Van Laarhoven, P. J. M. and Aarts, E. H. L., Simulated Annealing: Theory and Applications, Springer Netherlands, pp. 7-15, 1987.
[31]Rumelhart, D. E. and McClelland J. L., Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1 : foundations,” MIT Press Cambridge, MA, USA, pp. 318-362, 1986.
[32]Dorigo, M. and Gambardella, L. M., “Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem,” IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, pp. 53-66, Apr. 1997.
[33]Eberhart, R. and Kennedy, J., “A New Optimizer Using Particle Swarm Theory,” Proceedings of the Sixth International Symposium on Micro Machine and Human Science, 1995. MHS ’95, pp. 39-43, 1995.
[34]Storn R. and Price, K., “Differential Evolution - A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces,” Journal of Global Optimization, Vol. 11, No. 4, pp. 341-359, Dec. 1997.
[35]Cordón, O., Herrera-Viedma, E., López-Pujalte, C., Luque, M., and Zarco, C., “A Review on the Application of Evolutionary Computation to Information Retrieval,” International Journal of Approximate Reasoning, Vol. 34, No. 2-3, pp. 241-264, Nov. 2003.
[36]Bei, Z., Yu, Z., Zhang, H., Xiong, W., Xu, C., Eeckhout, L., and Feng, S., “RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration,” IEEE Transactions on Parallel and Distributed Systems, Vol. 27, No. 5, pp. 1470-1483, May 2016.
[37]Yigitbasi, N., Willke, T. L., Liao, G., and Epema, D., “Towards Machine Learning-Based Auto-tuning of MapReduce,” Proceedings of the 2013 IEEE 21st International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 11-20, 2013.
[38]Wang, K., Lin, X., and Tang, W., “Predator: An Experience Guided Configuration Optimizer for Hadoop MapReduce,” Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 419-426, 2012.
[39]Lama, P. and Zhou, X., “AROMA: Automated Resource Allocation and Configuration of MapReduce Environment in the Cloud,” Proceedings of the 9th International Conference on Autonomic Computing, New York, NY, USA, pp. 63-72, 2012.
[40]Ye, T. and Kalyanaraman, S., “A Recursive Random Search Algorithm for Large-scale Network Parameter Configuration,” Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, NY, USA, pp. 196-205, 2003.
[41]Liu, F.H., Liou, Y.R., Lo, H.F., Chang, K.C., and Lee, W.T., “The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing,” International Journal of Information and Electronics Engineering, Vol.4, No.6, pp. 480-484, Nov. 2014.
[42]Starfish Hadoop Log Analyzer, https://www.cs.duke.edu/starfish/release.html [Accessed online at 6/30/2016]
[43]Yang, H., Luan, Z., Li, W., and Qian, D., “MapReduce Workload Modeling with Statistical Approach,” Journal of Grid Computing, Vol. 10, No. 2, pp. 279-310, Jun. 2012.
[44]Dong, B., Zheng, Q., Tian, F., Chao, K.M., Ma, R., and Anane, R., “An Optimized Approach for Storing and Accessing Small Files on Cloud Storage,” Journal of Network and Computer Applications, Vol. 35, No. 6, pp. 1847-1862, Nov. 2012.
[45]Han, J., Ishii, M., and Makino, H., “A Hadoop performance model for multi-rack clusters,” Proceedings of the 2013 5th International Conference on Computer Science and Information Technology (CSIT), pp. 265-274, 2013.
[46]GeneXproTools, http://www.gepsoft.com/gxpt.htm [Accessed online at 6/30/2016]
[47]ACOTSP V1.03, http://www.aco-metaheuristic.org/aco-code/ [Accessed online at 6/30/2016]
[48]Dev-C++, https://sourceforge.net/projects/orwelldevcpp/ [Accessed online at 6/30/2016]
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top