研究生(外文):Lee, Chin-Feng
論文名稱(外文):Using Machine Learning to Manage Resources in Datacenters with Diverse Computing Requirements
指導教授(外文):Chou, Jerry
口試委員(外文):Lai, Kuan-ChouLee, Che-Rung
外文關鍵詞:machine learningresource managementruntime optimizationcluster
  • 被引用被引用:0
  • 點閱點閱:581
  • 評分評分:
  • 下載下載:139
  • 收藏至我的研究室書目清單書目收藏:1
隨著新型叢集運算應用的崛起,例如巨量資料分析和深度學習,Apache Mesos已成為流行的叢集資源管理工具。Mesos的資源提議 (resource offer) 機制允許框架排程器 (framework scheduler) 根據應用程式的實際限制與偏好來選擇最佳的資源。其中預設的階層式主導資源公平 (Dominant Resource Fairness) 分配器在簡單的工作擺放與資源需求的前提下,可以獲得接近最佳分配的效能。然而,若此前提不成立,可預期會發生較高的提議拒絕率 (offer rejection rate),並進一步導致整體效能低落。此外,在整體系統吞吐量 (overall system throughput) 為優先考量的情形中,與其讓框架排程器被動地等待合適的資源被提出,針對資源分配器 (resource allocator) 的改善更有效能提升的空間與機會。
  有鑑於此,我們提出利用機器學習 (machine learning) 改善資源提議品質的做法。本研究主要考慮在有限的資訊及使用者互動下,主動改善資源提議品質的問題。
  本研究提出一個品質感知 (quality aware) 的資源分配器,其中品質函數 (quality function) 已預先定義,以最佳化工作執行時間。此外,我們實作了一個模擬環境用於評估所提出資源分配器在多種合成批次處理工作 (batch-processing workload) 之下的效能。
  我們透過實驗證明所提出的做法在最佳情況下,能改善總工作完成時間達2倍,降低33%資源占用,減少46%的提議拒絕率以及改善70%工作的資料位置 (data locality) 需求。
Apache Mesos has become a popular cluster resource management tool with the emergence of various new cluster computing applications, such as Big Data analytics and deep learning. Resource offer mechanism of Mesos gives framework schedulers the ability to choose the best resources based on their own constraints and preferences. The default hierarchical DRF allocator gives near optimal results for simple task placement preferences and resource requirements under large resource pool running mostly short-living jobs. However, if these properties do not hold, higher offer rejection rate is expected, which leads to degraded overall performance. Moreover, in scenarios where the overall system throughput is the main concern, improving allocator has more chance for optimization instead of passively waiting for desirable resource offer to be given to frameworks.

Therefore, we propose to use machine learning techniques to improve offer quality. We consider the problem of actively improving the quality of resource offers with limited information and interactions to users.

In this work, we propose a quality-aware allocator with a pre-defined quality function for optimizing job execution time. In addition, we implemented an emulation environment to evaluate the performance of proposed allocator under various synthetic batch-processing workloads.

Our evaluation shows up to 2x improvement in total completion time, 33% higher residual capacity, 46% less rejection rate and 70% better allocation placement with data locality.
1 Introduction 4
2 Background 6
2.1 Mesos System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Problems and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Approach 9
3.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Quality Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Machine Learning Model Construction . . . . . . . . . . . . . . . . . 12
4 Emulation Environment 14
4.1 System Configuration Generator . . . . . . . . . . . . . . . . . . . . . 15
4.2 Workload Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Queue Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Mesosaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5 Task Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Evaluation 19
5.1 Environment and Parameter Settings . . . . . . . . . . . . . . . . . . 19
5.2 Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Quality Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Total Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Residual Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.6 Offer Rejection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.7 Data Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Related Work 28
7 Conclusion 29
