跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/06 05:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊智偉
研究生(外文):Chih-Wei Young
論文名稱:在GPU上使用CUDA處理稀疏建構函式
論文名稱(外文):Sparse Construction Functions for GPU Processing with CUDA
指導教授:張榮貴張榮貴引用關係
指導教授(外文):Rong-Guey Chang
口試委員:張榮貴陳鵬升黃元欣薛智文
口試委員(外文):Rong-Guey ChangPeng-Sheng ChenYuan-Shin HwangChih-Wen Hsueh
口試日期:2013-07-08
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:中文
論文頁數:47
中文關鍵詞:圖形處理器稀疏建構函式統一計算架構
外文關鍵詞:GPUCUDAFortransparse matrixconstruction function
相關次數:
  • 被引用被引用:0
  • 點閱點閱:508
  • 評分評分:
  • 下載下載:15
  • 收藏至我的研究室書目清單書目收藏:0
NVIDIA推出CUDA(Compute Unified Device Architecture) ,CUDA是個平行運算架構,利用強大的GPU作資料平行運算,此架構大幅提升運算效能,採用比較容易掌握的類C語言開發,開發者必須了解GPU架構和平行演算法設計,發揮GPU的強大效能。

在實際應用上例如科學上的流體力學、氣象預測、分析地震、基因工程等,此類計算都是需要超大量運算能力的稀疏矩陣運算。此時利用GPU多核心的特性,來進行高效能的平行運算是非常適合的,然而目前在CUDA上並沒有具備如此的稀疏矩陣函式庫來輔助開發者撰寫平行程式以縮短開發應用程式的時程,於是如何在CUDA架構上研發出高效能、友善使用方式的稀疏矩陣函式庫,是一項相當實用的技術。

CUDA(Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA. The use of powerful GPU for data parallel computing. This architecture significantly improve computing performance, and the CUDA platform is accessible to software developers through extensions to industry-standard programming languages, including C. Developers must understand the GPU architecture and parallel algorithm design, in order to achieve outstanding performance of GPU.

Practical applications such as fluid dynamics , weather forecasting , seismic analysis , genetic engineering etc., such sparse matrix operation that needs to exceed a large amount of operation ability. The use of multi-core GPU features for high-performance parallel computing is very suitable, in addition, to provide the CUDA sparse library to assist developers to write parallel programs in order to shorten the time of developing applications, then how to research and develop with high performance and user friendly sparse library on CUDA ,is a practical technology.

1、Introduction 1

2、Related Work 4
2.1 Architecture of GPU 4
2.2 Sparse matrix 9
2.3 CUDA 12

3、Sparse Construction Functions 18
3.1 Pack 18
3.2 Unpack 19
3.3 Reshape 21
3.4 Spread 22
3.5 Merge 22
3.6 Application 25

4、Optimization 28
4.1 Substituting register for global memory 29
4.2 Storing data in Shared memory with block 31
4.3 Splitting the mask array 35

5、Experimental Result 37
5.1 Environment 37
5.2 Benchmark 38
5.3 Result 38
5.3.1 The result of pack 39
5.3.2 The result of unpack 41
5.3.3 The result of reshape 42
5.3.4 The result of spread 43
5.3.5 The result of merge 44

6、Conclusion 46

7、REFERENCES 47


[1]http://en.wikipedia.org/wiki/Graphics_processing_unit
[2] Chang, R. G., Chuang, T. R., & Lee, J. K. (1998, July). Efficient support of parallel sparse computation for array intrinsic functions of Fortran 90. In Proceedings of the 12th international conference on Supercomputing (pp. 45-52). ACM.
[3] Bell, N., & Garland, M. (2008). Efficient sparse matrix-vector multiplication on CUDA (Vol. 20). NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation.
[4] Hong, S., & Kim, H. (2009). An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Computer Architecture News, 37(3), 152-163.
[5] Wende, F., Cordes, F., & Steinke, T. (2012, July). On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering. In Application Accelerators in High Performance Computing (SAAHPC), 2012 Symposium on (pp. 74-83). IEEE.
[6] Guo, P., & Wang, L. (2012, July). Accurate CUDA performance modeling for sparse matrix-vector multiplication. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 496-502). IEEE.
[7] Bauer, M., Cook, H., & Khailany, B. (2011, November). CudaDMA: optimizing GPU memory bandwidth via warp specialization. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (p. 12). ACM.
[8]Christen, M., Schenk, O., & Burkhart, H. (2007, October). General-purpose sparse matrix building blocks using the NVIDIA CUDA technology platform. In First Workshop on General Purpose Processing on Graphics Processing Units.
[9]Oberhuber, T., Suzuki, A., & Vacata, J. (2010). New row-grouped csr format for storing the sparse matrices on GPU with implementation in CUDA. arXiv preprint arXiv:1012.2270.
[10]Garland, M. (2008, June). Sparse matrix computations on manycore GPU's. In Proceedings of the 45th annual Design Automation Conference (pp. 2-6). ACM.
[11]NVIDIA,CUDA C Programming Guide , Version 4.2
[12] Xiao, S., & Feng, W. C. (2010, April). Inter-block GPU communication via fast barrier synchronization. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on (pp. 1-12). IEEE.
[13]NVIDIA CUDA GPU Computing Discussion Forum.
http://forums.nvidia.com/index.php?showtopic=104243.
[14]The Fortran 2003 Handbook
[15] NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110
[16]Benchmark:The University of Florida Sparse Matrix Collection,http://www.cise.ufl.edu/research/sparse/matrices/index.html
[17]Davis, T. A. (2006). Direct methods for sparse linear systems. Siam
[18]http://en.wikipedia.org/wiki/System_of_linear_equations

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊