(44.192.66.171) 您好!臺灣時間:2021/05/18 22:34
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:丁振新
研究生(外文):Chen-Hsin Ding
論文名稱:使用機器學習方法預測程序執行時間
論文名稱(外文):A task run time predictor using machine learning techniques
指導教授:蘇雅韻
指導教授(外文):Ya-Yunn Su
口試委員:薛智文林守德
口試委員(外文):Chih-Wen HsuehShou-De Lin
口試日期:2013-07-10
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:55
中文關鍵詞:雲端運算排程器機器學習預測資料探勘
外文關鍵詞:cloud computingschedulermachine learningpredictiondata mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:244
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
雲端運算可以在本地機器計算資源不足時提供計算資源。為了幫助 系統排程器決定一個程序該在本地端執行或者在雲端上執行,本論文 提供一個程序執行時間預測方法,使得排程器可根據預估的執行時間 決定要在本地或雲端執行此程序。本論文提出一個機器學習 (支持向量 回歸) 的方法,使用歷史資訊來建立預測模型。我們收集了程序執行的 歷史檔、系統負載資訊、檔案系統資訊,來預測重覆執行的高計算用 量程序執行時間。此方法實驗評估可達 20% 相對平均執行時間的平均 誤差,預測高計算用量開放原始碼程式的執行時間俱可行性。

Cloud computing is widely used in on-demand computing in recent years. A local computing cluster cannot always provide sufficient resources for ev- ery user. A computing cluster with cloud resources assisted could let users obtain computing resource faster while local machine are fully loaded. We collected traces from workstations in our department to understand how users use machines for computing and try to improve the scheduler. To make a sys- tem scheduler dispatch an incoming job to a local machine or to the cloud, we provide a task run time advisor for the scheduler making decisions accu- ractely. The run time advisor is a support vector regression model which is constructed by historical information. The prediction error is less than 20% error of mean run time in predicting CPU-bound open source projects. Our evaluation experiment is a data driven approach that the trace is collected from workstations in NTU CSIE.

誌謝 iii
摘要 v
Abstract vii
1 Introduction 1
2 Problem Statements 3
2.1 Motivation.................................. 3
2.2 Understandingworkloadcharacteristics .................. 4
2.3 Improvingschedulerdecisions ....................... 4
2.4 Goals .................................... 5
3 Trace Collection 7
3.1 Processstatustrace ............................. 7
3.2 Hostloadtrace ............................... 8
3.3 Filesystemtrace .............................. 8
4 Workload Characteristics 13
4.1 Methodology ................................ 13
4.2 Workloadclustering............................. 16
4.3 Statisticalanalysis.............................. 16
4.4 Insights from task classification and statistical analysis . . . . . . . . . . 23
5 Task Run Time Prediction 25
5.1 Amachinelearningtechniqueapproach .................. 25
5.2 Methodology ................................ 25
6 Evaluations 31
6.1 Adatadrivenapproachexperiment..................... 31
6.2 Initialevaluation .............................. 32
6.3 Onemonthlongtraceevaluation...................... 34
6.4 Filteringunpredictableprocesses...................... 36
7 Discussions 41
7.1 Factorsofunpredictability ......................... 41
7.2 Whywhitelistworks ............................ 44
7.3 Futurework................................. 46
8 Related Works 47
9 Conclusions 51
Bibliography 53

[1] Brining animation to life through cloud computing. In http://softwareinsight.intel.com/visual/visual-feature.php.
[2] Matlab k-means function help page. In http://www.mathworks.com/help/stats/kmeans.html.
[3] Support vector machine wiki page. In https://en.wikipedia.org/wiki/Support_vector_machine.
[4] Workstaion hardwares in ntu csie. In http://wslab.csie.ntu.edu.tw/hardware/.
[5] Adamic, L. Zipf, power-laws and pareto - a ranking tutorial.
[6] Basak, D., Pal, S., and Patranabis, D. C. Support vector regression. In Neural Information Processing (October 2007).
[7] Chang, C.-C., and Lin, C.-J. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[8] Chen, Y., Ganapathi, A., Griffith, R., and Katz, R. The case for evaluating mapre- duce performance using workload suites. In Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), IEEE (July 2011).
[9] Chen, Y., Ganapathi, A. S., Griffith, R., and Katz, R. H. Analysis and lessons from a publicly available google cluster trace. In UC Berkeley Technical Report (June 2010).
[10] Dinda, P. Online prediction of the running time of tasks. In Cluster Computing (2001).
[11] Dinda, P. A., and O’Hallaron, D. R. Host load prediction using linear models. In Cluster Computing (2000).
[12] Drucker,H.,Burges,C.J.C.,Kaufman,L.,Smola,A.,andVapnik,V.Supportvector regression machines. In Neural Information Processing Systems (1997).
[13] Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and Fonseca, R. Jockey: Guar- anteed job latency in data parallel clusters. In Proceedings of the 7th ACM european conference on Computer Systems (EuroSys) (2012).
[14] Iverson, M. A., Ozguner, F., and Potter, L. C. Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environ- ment. In IEEE Heterogeneous Computing Workshop (1999).
[15] Kapadia, N. H., Fortes, J. A. B., and Brodley, C. E. Predictive application- performance modeling in a computational grid environment. In High Performance Distributed Computing (ISHPDC), IEEE (August 1999).
[16] Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P. An analysis of traces from a production mapreduce cluster december. In Cluster, Cloud and Grid Computing (CCGrid), IEEE (May 2010).
[17] Lama, P., and Zhou, X. Aroma: Automated resource allocation and configuration of mapreduce environment in the cloud. In Proceedings of the 9th international conference on Autonomic computing (ICAC) (2012).
[18] Leung, A. W., Pasupathy, S., Goodson, G., and Miller, E. L. Measurement and analysis of large-scale network file system workloads. In USENIX Annual Technical Conference on Annual Technical Conference (ATC) (2008).
[19] Mishra, A. K., Hellerstein, J. L., Cirne, W., and Das, C. R. Towards characterizing cloud backend workloads: Insight from google compute clusters. In ACM SIGMET- RICS Performance Evaluation Review (March 2010).
54
[20] Newman, M. Power laws, pareto distributions and zipf’s law. In Contemporary Physics (2005).
[21] Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A. Heterogene- ity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing Article No. 7 (2012).
[22] Smith,W.Predictionservicesfordistributedcomputing.InParallelandDistributed Processing Symposium (IPDPS), IEEE (March 2007).
[23] Smith, W., Foster, I., and Taylor, V. Predicting application run times using historical information. In Lecture Notes on Computer Science (1998).
[24] Smola, A. J., and Scholkopf, B. A tutorial on support vector regression. In Statistics and Computing (August 2004).
[25] Vaarandi, R. Simple logfile clustering tool. In http://ristov.users.sourceforge.net/slct/.
[26] Vaarandi, R. A data clustering algorithm for mining patterns from event logs. In IEEE workshop on IP Operations and Management (2003).
[27] Verma, A., Cherkasova, L., and Campbell, R. H. Aria: Automatic resource inference and allocation for mapreduce. In Proceedings of the 9th international conference on Autonomic computing (ICAC) (2011).

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊