論文名稱(外文):Dynamic Task Amount Adjustment Policy Based on Intermediate Data Quantity Prediction for Improving Performance of Small-Scale MapReduce Clouds
外文關鍵詞:Cloud ComputingIntermediate DataTaskPrediction
MapReduce雲在現今已經是常見的雲端計算平台。藉由許多遵循MapReduce設計規範的應用程式,MapReduce雲可以利用雲中的高計算能力來處理很多問題。但來源的輸入資料並不完全都一樣,且應用程式可能用不同的邏輯來處理這些輸入資料來產生中間資料。因此,就會造成中間資料分配不均在各台電腦上,而造成中間資料偏斜。當發生中間資料偏斜問題時,有些電腦是空閒,而另一些電腦可能是忙碌,進而造成整體效能嚴重下降。假設我們可以分配出較適合每台電腦的行程數來處理輸入資料與中間資料,就可以避免這些空閒電腦浪費資源。我們這篇論文提出一個以預測中間資料量為基礎之行程數量動態調變策略(Dynamic Task Amount Adjustment Policy,縮寫DTAAP)來改善小規模MapReduce雲之效能。此外,我們也實驗常用的應用程式來實測DTAAP與其兩種對照系統來比較效能。
A MapReduce cloud is a general cloud computing platform nowadays. Through many applications based on the MapReduce design principle, a MapReduce cloud can utilize the high computation power in a cloud to resolve many problems. However, input data is not arranged and distributed averagely and a different application may have a different algorithm to process the input data. As a result, intermediate data may be unaveragely distributed over computations to incur the intermediate data skew. When the intermediate data skew happens to a cloud, the overall performance degrades because some computers are idle, others are busy. If we can allocate a suitable amount of tasks to each computation to process input data and intermediate data, we can avoid wasting computation power in idle computations. In the thesis, we propose a Dynamic Task Amount Adjustment Policy (DTAAP) based on intermediate data quantity prediction in order to improveperformance of a small-scale MapReduce cloud. Besides, we use popular applications to observe performance of DTAAP and compare it with two different MapReduce platforms.
