(3.238.130.97) 您好!臺灣時間:2021/05/15 13:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃彥周
研究生(外文):Yan-JhouHuang
論文名稱:基於Hadoop之非MapReduce的大資料R平行運算
論文名稱(外文):Massively R Data Parallel Computation over Hadoop without MapReduce
指導教授:蕭宏章
指導教授(外文):Hung-Chang Hsiao
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:31
中文關鍵詞:YARNR分散式運算
外文關鍵詞:YARNRDistributed processing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:231
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
R語言是一種主要用於統計分析、繪圖、資料探勘的腳本語言。R以豐富的函數套件庫、活躍的開發社群與其套件的擴充性著稱。並且R本身是開放原始碼,亦有以編譯的執行檔版本可在多種平台下執行。隨著R在統計分析的領域越來越熱門,其在運算效能上的要求也逐漸成為重點。R在本身設計上是以單個執行緒運行,而目前已經除了官方平行套件外仍有許多支援平行運算的套件。但是此類型的平行套件均以核心數為單位進行平行運算,而這項特點使得R在面對大資料運算的情況下都是在單一伺服器等級的電腦上進行高負載的運算。
在此篇論文中,我們提出一個新的分散式運算架構的來解決R在面臨大量資料運算時的不足,設計的目的在於移植R程式至分散式環境執行並隱藏分散式的細節。該運算架構是建立於Hadoop YARN上的一個運算模型,使用YARN提供的叢集資源管理與分配並建構出適合R執行的工作流程與管理。YARN的核心概念是將叢集資源管理與運算工作管理委任給不同的元件負責,由一個全域的資源管理ResourceManager (RM) 與每個任務都唯一的元件ApplicationMaster (AM) 負責。而這項設計實做了AM的流程並定義了符合R運算的控制流程,目標是排程並分派R的任務到由RM提供的運算資源單位Container中執行並且由AM去控制工作流程,來達到分散式執行R。並且除了提供基本的執行服務如紀錄Console端訊息和紀錄R執行期間發生的錯誤內容,更提供一些進階的分散式支援功能如資源檔案派送、動態分派任務、本地性任務排程、R使用者自定義執行函數等。

R is a free, open source script language facilities for data manipulation, calcula-tion, statistical computing and graphical display. R is also famous for its popular com-munity where developer share library and contribution. But when it comes to parallel processing issue. R is originally design in single thread. Even though some parallel pack-age to parallel by R process by core numbers, which results in computing R in single server-level environment most of the time.
In this paper, we summarize the design, development, current state and future en-hance of new application, which utilize the cluster resource management of Hadoop YARN. The fundamental idea of YARN is to split up the functionalities of resource man-agement and job controlling and scheduling into separate component. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). our ap-proach is an implementation of AM and following complete job control flow. The goal is to schedule and dispatch R tasks into isolated resource container provided by RM and controlled by AM. By utilize cluster resource to build up distributed processing environ-ment for R. All basic support of R processing like console and run time error recording is also provided. Furthermore, supporting same advance options to assist R processing like resource files dispatch, dynamic scheduling, locality scheduler and R UDF.

摘要 i
ABSTRACT ii
Extended Abstract iii
致謝 vi
目錄 vii
表格 ix
圖 x
第一章 簡介 1
第二章 研究背景 6
2.1 YARN 叢集資源管理 6
2.1.1 Resource Manager (RM) 7
2.1.2 NodeManager (NM) 7
2.1.3 ApplicationMaster (AM) 8
2.1.4 檔案處理 8
第三章 系統架構 10
3.1 使用範例 10
3.1.1 使用介面 11
3.1.2 使用者與管理者維護 12
3.2 Client端執行流程 12
3.2.1 任務監控 13
3.3 ApplicationMaster端執行流程 13
3.4 Container端執行流程 14
第四章 系統設計 16
4.1 任務排程與派發 16
4.2 任務內部之動態溝通 18
4.3 容錯處理 19
4.4 R使用者自定義函數 20
第五章 實驗 21
5.1 實驗環境 21
5.2 實驗結果 21
5.3 系統成本 23
第六章 相關研究 26
第七章 結論 28
參考資料 29
[1] R. Ihaka and R. Gentleman. R: A Language for Data Analysis and Graphics. In Pro-ceeding of Journal of Computational and Graphical Statistics, 5(3):299--314, 1996.
[2] Y. Hao. Interface (Wrapper) to MPI (Message-Passing Interface) https://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf/, 2016.
[3] L. Tierney, A. J. Rossini, Na Li, and H. Sevcikova. Simple Network of Workstations https://cran.r-project.org/web/packages/snow/snow.pdf, 2015.
[4] Microsoft R Open, original Revolution R Open (RRO) https://www.microsoft.com/en-us/cloud-platform/r-server.
[5] Intel® Math Kernel Library. https://software.intel.com/en-us/intel-mkl.
[6] M. Schmidberger, M. Morgan, D. Eddelbuettel, Y. Hao, L. Tierney, and U. Mans-mann. State of the Art in Parallel Computing with R. In Proceeding of Journal of Statistical Software, 3(1) 2009.
[7] Rmr2 https://github.com/RevolutionAnalytics/rmr2
[8] S. Venkataraman1, Z. Yang1, D. Liu, E. Liang, Hossein , Falaki, X. Meng, R. Xin, A. Ghodsi, M. Franklin, I. Stoica, and M. Zaharia. SparkR: Scaling R Programs with Spark. In Proceeding of SIGMOD’16, 2014.
[9] HP Distributed R http://www8.hp.com/us/en/software-solutions/predictive-analytics-r-big-data/
[10] V. K. Vavilapallih, A. C. Murthyh, C. Douglasm, S. Agarwali, M. Konarh, R. Evansy, T. Gravesy, J. Lowey, H. Shahh, S. Sethh, B. Sahah, C. Curinom, O. O’Malleyh, S. Radiah, B. Reedf, and E. Baldeschwielerh. Apache Hadoop YARN: Yet Another Re-source Negotiator. In Proceeding of SoCC’13, 2013.
[11] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey . DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In Proceeding of OSDI'08, 1-14, 2008.
[12] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In Proceeding of EuroSys'07, 59-72 2007.
[13] B. G. Chun, C. Douglas, S. Narayanamurthy, J. Rosen, T. Condie, S. Matusevych, R. Ramakrishnan, R. Sears, C. Curino, B. Myers, S. Rao, and M. Weimer. REEF: Re-tainable Evaluator Execution Framework. In Proceeding of the 39th International Conference on Very Large Data Bases, 2013.
[14] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceeding of HotCloud 2010, 2010.
[15] A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jack-son, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm @Twitter. In Proceeding of SIGMOD'14, 2014.
[16] B. Sahah, H. Shahh, S. Sethh, G. Vijayaraghavanh, A. Murthyh, and C. Curinom. Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications. In Proceeding of SIGMOD'15, 2015.
[17] S. Loughran, D. Das, and E. Baldeschwieler. Introducing Hoya – HBase on YARN. http://hortonworks.com/blog/introducing-hoya-hbase-on-yarn, 2013.
[18] K. Shvachko, H. Kuang, S. Radia, and R. Chansler . The Hadoop Distributed File System. In Proceeding of IEEE, 2010.
[19] J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. In Proceeding of OSDI, 2004.
[20] DRS 李家齊.資料中繼與轉傳服務系統(未發表的碩士論文).成功大學分散式系統實驗室。[Li chiachi. The Data Relaying Service (Unpublished master’s thesis). Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C, 2016.]
[21] Rserver https://rforge.net/Rserve/
[22] JRI https://rforge.net/JRI/
[23] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S.Shenker, and I. Stoica. Mesos: a Platform for Fine-Grained Resource Sharing in the Data Cen-ter. In Proceeding of NSDI'11, 22–22, 2011.
[24] M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. In Proceeding of EuroSys'13, 2013.
[25] Facebook Engineering Team. Under the Hood:Scheduling MapReduce Jobs More Efficiently with Corona. http://on.fb.me/TxUsYN, 2015.
[26] R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S.Weaver, and J. Zhou. Scope: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow, 1(2):1265–1276, 2008.
[27] D. Thain, T. Tannenbaum, and M. Livny. Distributed Computing in Practice: the Condor Experience. In Proceeding of Concurrency and Computation: Practice and Experience, 2005.
[28] N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounie, P. Neyron, and O . Richard. A Batch Scheduler with High Level Components. In Proceeding of Cluster Computing and the Grid CCGrid IEEE International Symposium, 2005.
[29] W. Emeneker, D. Jackson, J. Butikofer, and D. Stanzione. Dynamic virtual clustering with xen and moab. In G. Min, B. Martino, L. Yang, M. Guo, and G. Rnger, editors. Frontiers of High Performance Computing and Networking. In Proceeding of ISPA 2006 Workshops, 4331: 440–451, 2005.
[30] D. B. Jackson, Q. Snell, and M. J. Clement. Core Algorithms of the Maui Scheduler : Job Scheduling Strategies for Parallel Processing. In Proceeding of JSSPP'01, 87–102, 2001.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
無相關點閱論文