(18.232.50.137) 您好!臺灣時間:2021/05/06 18:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:吳浚楷
研究生(外文):Jyun-Kai Wu
論文名稱:一個建構在GPU運算叢集的Matlab加速工具
論文名稱(外文):An Acceleration Toolkit of Matlab based on GPU clusters
指導教授:梁廷宇
指導教授(外文):Tyng-Yeu Liang
學位類別:碩士
校院名稱:國立高雄應用科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:100
語文別:中文
論文頁數:105
中文關鍵詞:MatlabGPGPU負載平衡資料快取鬆散的資料更新協定
外文關鍵詞:MatlabGPGPULoad balance mechanismData cacheLazy data update protocol
相關次數:
  • 被引用被引用:0
  • 點閱點閱:723
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:49
  • 收藏至我的研究室書目清單書目收藏:0
本研究提出一套可利用GPU運算叢集來加速Matlab的工具軟體,稱之為ATOM。在此工具軟體的支援下,使用者在Matlab裡所下達的運算指令將會被攔截,並重新導至ATOM的伺服器上進行平行運算。由於GPU叢集的運算裝置能力各異,為了讓資源充分地被利用,ATOM有支援負載平衡機制。透過負載平衡機制,每個運算裝置可分配到適當的運算量,以達到負載平衡進而增加Matlab執行的速度。另一方面,為了減少不必要的網路傳輸消耗,ATOM在伺服器端的設計上,採取資料快取的概念以及鬆散的資料更新協定,將網路傳輸損耗減至最低。資料快取概念為讓使用者先行上載所需運算資料,往後運算將不需要再透過網路來擷取所需資源。Lazy更新機制則是讓使用者在需要進行實際運算時才更新資料。實驗證明,透過上述的機制ATOM可以有效地利用GPU運算叢集,讓Matlab的運算效能獲得明顯的改善。
This research is aimed at developing an acceleration toolkit of Matlab called ATOM based on GPU clusters. With the support of this toolkit, the instructions of matrix operation from Matlab will be captured and redirected to ATOM servers for parallel computing. Because the computational ability of devices are different in GPU-cluster, ATOM supports the load balance mechanism for utilizing resource sufficiently. Each computational device is assigned with a proper amount of computational data to achieve load balance and to increase the execution speed of Matlab by the load balance mechanism. In addition, for decreasing unnecessary communication cost, ATOM imports applies data cache and lazy data-update protocol to minimize the communication cost of distributing data over GPU clusters for parallel computing. The concept of data cache is to let users upload data onto ATOM servers and then the servers need not to fetch data from Matlab during data computation. The lazy-update protocol is not to maintain the consistency of the cached data unless the data is acquired. The experiments show that ATOM can exploit GPU clusters effectively by using the above mechanisms to improve the performance obviously.
目錄
目錄 iv
圖目錄 vi
表目錄 ix
第1章 緒論 - 1 -
1.1研究動機與目的 - 1 -
1.2 論文架構 - 3 -
第2章 相關研究與研究背景 - 4 -
2.1 MatlabMPI - 4 -
2.2 pMatlab - 5 -
2.3 Otter - 6 -
2.4 GPUmat - 8 -
2.5 PCT(Parallel Computing Toolbox) - 9 -
2.6 Jacket - 10 -
第3章 ATOM的設計 - 12 -
3.1 系統特性 - 12 -
3.2 目前支援的介面與限制 - 14 -
3.3 系統架構 - 16 -
3.4 資源配置 - 22 -
3.5 GPU物件的管理與同步機制 - 25 -
3.6 ATOM的工作分派方式 - 26 -
第4章 ATOM的系統實作 - 28 -
4.1 ATOM的運作流程 - 28 -
4.1.1 用戶端流程 - 28 -
4.1.2 ATOM註冊器 - 29 -
4.1.2.1 註冊器的運作流程 - 30 -
4.1.2.2 註冊器的實作 - 33 -
4.1.3伺服器端流程 - 35 -
4.2 Mex區塊的實作方法與流程 - 40 -
4.2.1 Mex區塊的實作 - 40 -
4.2.2 Mex區塊的流程 - 41 -
4.3雜湊表區塊 - 43 -
4.4節點分派器的運作流程 - 44 -
4.5裝置分派器的運作流程 - 47 -
4.6負載平衡機制 - 53 -
4.7資料同步機制 - 55 -
第5章 效能量測與評估 - 58 -
5.1實驗環境與效能測試 - 58 -
5.1.1單節點效能與消耗 - 58 -
5.1.2 ATOM系統的損耗 - 63 -
5.1.3 ATOM系統與其他系統比較 - 65 -
5.1.3.1高運算密集度比較 - 66 -
5.1.3.2低運算密集度比較 - 67 -
5.1.3.3資料上載比較 - 69 -
5.1.4多節點效能與消耗 - 71 -
5.2負載平衡實驗 - 78 -
5.2.1負載平衡機制 - 78 -
5.2.2負載平衡效能比較 - 81 -
5.3檢查與更新機制 - 83 -
第6章 結論與未來工作 - 85 -
6.1結論 - 85 -
6.2未來工作 - 86 -
參考文獻 - 87 -
參考文獻
[1] Khronos OpenCL Working Group, "The OpenCL Specification version1.2" , Obtained through the Internet: http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
[2] P. H.Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar,N. Y. Yang, G.-Y. Lueh, and H. Wang, “EXOCHI: architecture and programming environment for a heterogeneous multi-core ultithreaded system”, in Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 156-166, 2007.
[3] S. Collange, D. Defour and A. Tisserand: “Power Consumption of GPUs from a Software Perspective”, in International Conference on Computational Science, LNCS, vol.5544, pp. 914-923 , 2009.
[4] Volodymyr V. Kindratenko, Jeremy J. Enos, Guochun Shi, Michael T. Showerman, Galen W. Arnold, John E. Stone, James C. Phillips, and Wen-mei Hwu, “GPU Clusters for High-Performance Computing”, Cluster Computing and Workshops, pp. 1-8, 2009.
[5] Wenbin Fang, Bingsheng He, Qiong Luo, and Naga K. Govindaraju . “Mars: Accelerating MapReduce with Graphics Processors”, Parallel and Distributed Systems, vol.22, pp. 608-620, 2011
[6] Mark Baker, Amy Apon, Rajkumar Buyya, and Hai Jin, "Cluster Computing and Applications",2000
[7] Mark Baker , and Rajkumar Buyya. "Cluster Computing at a Glance", High Performance Cluster Computing, vol1, 1999
[8] Jeremy Kepner, and Stan Ahalt, "MatlabMPI", Journal of Parallel and Distributed Computing, vol.64, Issue 8, pp. 997–1005, 2004
[9] N. Travinin Bliss, and J. Kepner, "pMATLAB Parallel MATLAB Library", International Journal of High Performance Computing Applications, pp. 336-359, 2007
[10] Michael J. Quinn, Alexey Malishevsky, Nagajagadeswar Seelam, and Yan Zhao, "Preliminary results from a parallel MATLAB compiler", Parallel Processing Symposium, pp. 81-87, 1998
[11] The Message Passing Interface (MPI) standard Obtained through the Internet: http://www.mcs.anl.gov/research/projects/mpi/
[12] Jacket v2.2 docutment Obtained through the Internet: http://www.accelereyes.com/
[13] GPUmat version 0.280 docutment Obtained through the Internet: http://gp-you.org/
[14] Parallel Computing Toolbox douctment Obtained through the Internet: http://www.mathworks.com/help/toolbox/distcomp/
[15] NVIDIA Group, “Nvidia CUDA Programming Guide Version 4.2” Obtained through the Internet: http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
[16] NVIDIA Group, “NVIDIA CUDA SDK - Linear Algebra” Obtained through the Internet: http://www.nvidia.com/content/cudazone/cuda_sdk/Linear_Algebra.html
[17] CULA Programmer’s Guide R13 Obtained through the Internet: http://www.culatools.com/cula_dense_programmers_guide/
[18] Srinidhi Kestur, John D. Davis, Oliver Williams, "BLAS Comparison on FPGA, CPU and GPU", VLSI (ISVLSI), 2010 IEEE Computer Society Annual Symposium on, pp. 288-293, 2010
[19] E.Anderson, Z. Bai, J. Dongarra , A. Greenbaum, and A. McKenney , "LAPACK: A Portable Linear Algebra Library for High-Performance Computers", Supercomputing '90., pp.2-11
[20] NVIDIA Group, CUDA Developer Guide for Optimus Platforms Obtained through the Internet: http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_Developer_Guide_for_Optimus_Platforms.pdf
[21] Coope, I.D., "Circle fitting by linear and nonlinear least squares", Journal of Optimization Theory and Applications, vol.76, Issue 2, 1993
[22] 高煥堂,"C程式設計精華", 儒林圖書公司, 2006
[23] 李傳亮, "TCP/IP網路實驗設計程式", 全華科技圖書股份有限公司, 2002
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔