(54.236.58.220) 您好!臺灣時間:2021/03/06 21:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王偉吉
研究生(外文):Wei-Ji Wang
論文名稱:以自適性GC感知負載平衡機制優化Apache Spark叢集效能
論文名稱(外文):Performance Enhancement of Apache Spark Execution with adaptive GC-aware load balancing mechanism
指導教授:高勝助高勝助引用關係
指導教授(外文):Shang-Juh Kao
口試委員:廖宜恩連耀南
口試委員(外文):I-En LiaoYao-Nan Lien
口試日期:2017-06-29
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊科學與工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:27
中文關鍵詞:Apache SparkCluster ComputingMemory managementLoad balancingTask schedulingGarbage collection
外文關鍵詞:Apache SparkCluster ComputingMemory managementLoad balancingTask schedulingGarbage collection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:673
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
Apache Spark的記憶體管理機制,有別於Apache Flink採取積極的管理機制,主要由JVM所託管,因此也面臨因垃圾回收造成計算延遲的效能問題。Apache Spark在叢集計算過程中,垃圾回收約佔Apache Spark總執行時間的20-25%。
本研究提出一個自適性GC感知負載平衡機制(Flying-geese Mechanism),透過動態分配演算法,使得任務調度器會依各Executor Heap佔有率的即時負載狀況,動態配置任務數到Executor執行計算。實驗結果顯示本機制相對於Apache Spark隨著PageRank演算法的迭代次數增加效能提升更趨明顯,並在200次迭代計算時提升了6.80% 的效能。
Apache Spark is different from Apache Flink in making use of active memory management mechanism by considering only the JVM Heap space for memory allocation. When the Major GC is triggered, JVM will stop the application threads. In the process of cluster operations of Apache Spark, the suspension time of application during garbage collection make take around 20-25% of the total execution time.
This study aimed at improving the Apache Spark by detecting the heap occupancy rate of each node before the next Stage task assignment. By dynamically assigning the number of tasks according to the Heap occupancy, each executor can perform the tasks cooperatively and efficiently. The results show that our purposed method outperforms the Apache Spark running time by 6.80% after computing PageRank about 200 iterations on 5 clusters.
摘要 i
Abstract ii
目次 iii
圖目錄 iv
表目錄 v
第一章 緒論 1
1.1 研究背景與動機 1
1.2 論文貢獻 2
1.3 論文架構 2
第二章 相關研究 4
2.1 JVM 垃圾回收 4
2.2 垃圾回收對Apache Spark的影響 5
2.3 Apache Spark 11
2.3.1 調度機制 12
2.3.2 Executor結構 12
第三章 自適性GC感知負載平衡機制 14
3.1 Apache Spark任務調度 14
3.2 Flying-geese機制 16
第四章 系統實作與分析 19
4.1 實驗環境建置 19
4.2 實驗結果與分析 21
第五章 結論與未來展望 25
參考文獻 26
[1]Apache Spark. http://spark.apache.org/
[2]Apache Flink. http://flink.apache.org/
[3]Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, Nhan Nguyen, “NumaGiC: A garbage collector for big data on big NUMA machines,” ASPLOS ''15 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015, pp. 661-673.
[4]Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, Guoqing Xu, “FACADE: A compiler and runtime for (Almost) object-bounded Big Data applications,” ASPLOS ''15 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015, pp. 675-690.
[5]Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Derek Murray, Steven Hand, Michael Isard, “Broom: sweeping out garbage collection from big data systems,” HOTOS''15 Proceedings of the 15th USENIX conference on Hot Topics in Operating Systems, 2015.
[6]Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, “Speculative region-based memory management for big data systems,” PLOS ''15 Proceedings of the 8th Workshop on Programming Languages and Operating Systems, 2015, pp. 27-32.
[7]Lu Lu, Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei, Ligang He, Yuanzhen Geng, “Lifetime-based memory management for distributed data processing systems,” Proceedings of the VLDB Endowment Volume 9 Issue 12, 2016, pp. 936-947.
[8]Rodrigo Bruno, Paulo Ferreira, “NG2C: N-Generational garbage collector for big data memory management,” unpublished.
[9]Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, Onur Mutlu, “Yak: a high-performance big-data-friendly garbage collector,” OSCI''16 : Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, 2016, pp. 348-365.
[10]Martin Maas, Krste Asanović, Tim Harris, John Kubiatowicz, “The Case for the Holistic Language Runtime System,” First International Workshop on Rack-Scale Computing, 2014.
[11]Martin Maas, Tim Harris, Krste Asanović, John Kubiatowicz, “Trash day: coordinating garbage collection in distributed systems,” HOTOS''15 Proceedings of the 15th USENIX conference on Hot Topics in Operating Systems, 2015.
[12]Martin Maas, Krste Asanović, Tim Harris, John Kubiatowicz, “Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications,” ASPLOS ''16 Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016, pp. 457-471.
[13]Sujoy Saraswati, Soumitra Chatterjee, Ranganath Ramachandra, “Steal-A-GC: Framework to Trigger GC during Idle Periods in Distributed Systems,” High Performance Computing (HiPC), 2016 IEEE 23rd International Conference on, 2016.
[14]Hsin-Yu Shih, Jhih-Jia Huang, Jenq-Shiou Leu, “Dynamic slot-based task scheduling based on node workload in a MapReduce computation model,” Anti-Counterfeiting, Security and Identification (ASID), 2012 International Conference on, 2012.
[15] LiveJournal social network.
http://snap.stanford.edu/data/soc-LiveJournal1.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔