跳到主要內容

臺灣博碩士論文加值系統

(44.220.251.236) 您好!臺灣時間:2024/10/04 09:50
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李青峰
研究生(外文):Lee, Chin-Feng
論文名稱:運用機器學習解決資料中心多樣化計算要求的資源管理問題
論文名稱(外文):Using Machine Learning to Manage Resources in Datacenters with Diverse Computing Requirements
指導教授:周志遠
指導教授(外文):Chou, Jerry
口試委員:賴冠州李哲榮
口試委員(外文):Lai, Kuan-ChouLee, Che-Rung
口試日期:2017-08-10
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:106
語文別:英文
論文頁數:32
中文關鍵詞:機器學習資源管理執行時間最佳化叢集
外文關鍵詞:machine learningresource managementruntime optimizationcluster
相關次數:
  • 被引用被引用:0
  • 點閱點閱:581
  • 評分評分:
  • 下載下載:139
  • 收藏至我的研究室書目清單書目收藏:1
隨著新型叢集運算應用的崛起,例如巨量資料分析和深度學習,Apache Mesos已成為流行的叢集資源管理工具。Mesos的資源提議 (resource offer) 機制允許框架排程器 (framework scheduler) 根據應用程式的實際限制與偏好來選擇最佳的資源。其中預設的階層式主導資源公平 (Dominant Resource Fairness) 分配器在簡單的工作擺放與資源需求的前提下,可以獲得接近最佳分配的效能。然而,若此前提不成立,可預期會發生較高的提議拒絕率 (offer rejection rate),並進一步導致整體效能低落。此外,在整體系統吞吐量 (overall system throughput) 為優先考量的情形中,與其讓框架排程器被動地等待合適的資源被提出,針對資源分配器 (resource allocator) 的改善更有效能提升的空間與機會。
  有鑑於此,我們提出利用機器學習 (machine learning) 改善資源提議品質的做法。本研究主要考慮在有限的資訊及使用者互動下,主動改善資源提議品質的問題。
  本研究提出一個品質感知 (quality aware) 的資源分配器,其中品質函數 (quality function) 已預先定義,以最佳化工作執行時間。此外,我們實作了一個模擬環境用於評估所提出資源分配器在多種合成批次處理工作 (batch-processing workload) 之下的效能。
  我們透過實驗證明所提出的做法在最佳情況下,能改善總工作完成時間達2倍,降低33%資源占用,減少46%的提議拒絕率以及改善70%工作的資料位置 (data locality) 需求。
Apache Mesos has become a popular cluster resource management tool with the emergence of various new cluster computing applications, such as Big Data analytics and deep learning. Resource offer mechanism of Mesos gives framework schedulers the ability to choose the best resources based on their own constraints and preferences. The default hierarchical DRF allocator gives near optimal results for simple task placement preferences and resource requirements under large resource pool running mostly short-living jobs. However, if these properties do not hold, higher offer rejection rate is expected, which leads to degraded overall performance. Moreover, in scenarios where the overall system throughput is the main concern, improving allocator has more chance for optimization instead of passively waiting for desirable resource offer to be given to frameworks.

Therefore, we propose to use machine learning techniques to improve offer quality. We consider the problem of actively improving the quality of resource offers with limited information and interactions to users.

In this work, we propose a quality-aware allocator with a pre-defined quality function for optimizing job execution time. In addition, we implemented an emulation environment to evaluate the performance of proposed allocator under various synthetic batch-processing workloads.

Our evaluation shows up to 2x improvement in total completion time, 33% higher residual capacity, 46% less rejection rate and 70% better allocation placement with data locality.
1 Introduction 4
2 Background 6
2.1 Mesos System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Problems and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Approach 9
3.1 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Quality Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Machine Learning Model Construction . . . . . . . . . . . . . . . . . 12
4 Emulation Environment 14
4.1 System Configuration Generator . . . . . . . . . . . . . . . . . . . . . 15
4.2 Workload Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Queue Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Mesosaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5 Task Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Evaluation 19
5.1 Environment and Parameter Settings . . . . . . . . . . . . . . . . . . 19
5.2 Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Quality Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Total Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Residual Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.6 Offer Rejection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.7 Data Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Related Work 28
7 Conclusion 29
[1] Apache Spark --- lightning-fast cluster computing. https://spark.apache.org/.
[2] Mesosaurus --- Mesos task load simulator framework for (cluster and Mesos) performance analysis. https://github.com/mesosphere/mesosaurus.
[3] MPI forum. http://mpi-forum.org/.
[4] TensorFlow. https://www.tensorflow.org/.
[5] Dean, J., and Ghemawat, S. Mapreduce: simplified data processing on large clusters. Communications of the ACM 51, 1 (2008), 107-113.
[6] Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189-1232.
[7] Friedman, J. H. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367-378.
[8] Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI (2011), vol. 11, pp. 24-24.
[9] Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R. H., Shenker, S., and Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI (2011), vol. 11, pp. 22-22.
[10] Lee, G., and Katz, R. H. Heterogeneity-aware resource allocation and scheduling in the cloud. In HotCloud (2011).
[11] Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (2010), ACM, pp. 135-146.
[12] Mao, H., Alizadeh, M., Menache, I., and Kandula, S. Resource management with deep reinforcement learning. In HotNets (2016), pp. 50-56.
[13] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825-2830.
[14] Tesauro, G., Jong, N. K., Das, R., and Bennani, M. N. A hybrid reinforcement learning approach to autonomic resource allocation. In Autonomic Computing, 2006. ICAC'06. IEEE International Conference on (2006), IEEE, pp. 65-73.
[15] Wang, W., Liang, B., and Li, B. Multi-resource fair allocation in heterogeneous cloud computing systems. IEEE Transactions on Parallel and Distributed Systems 26, 10 (2015), 2822-2835.
[16] Winstein, K., and Balakrishnan, H. Tcp ex machina: Computer-generated congestion control. In ACM SIGCOMM Computer Communication Review (2013), vol. 43, ACM, pp. 123-134.
[17] Yigitbasi, N., Willke, T. L., Liao, G., and Epema, D. Towards machine learning-based auto-tuning of mapreduce. In Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2013 IEEE 21st International Symposium on (2013), IEEE, pp. 11-20.
[18] Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems (2010), ACM, pp. 265-278.
[19] Zarchy, D., Hay, D., and Schapira, M. Capturing resource tradeoffs in fair multi-resource allocation. In Computer Communications (INFOCOM), 2015 IEEE Conference on (2015), IEEE, pp. 1062-1070.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top