跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/04 05:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡松翰
研究生(外文):Tsai,Sung-Han
論文名稱:以NVIDIA CUDA架構為基礎之稀疏矩陣乘積計算最佳化
論文名稱(外文):Optimization for sparse matrix-vector multiplication based on NVIDIA CUDA platform
指導教授:魏凱城
指導教授(外文):Wei, Kai-Cheng
口試委員:魏凱城楊朝棟伍朝欽
口試委員(外文):Wei, Kai-ChengYang, Chao-TungWu, Chao-Chin
口試日期:2017-07-28
學位類別:碩士
校院名稱:國立彰化師範大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:34
中文關鍵詞:稀疏矩陣向量乘積矩陣排列GPUCUDA平行計算
外文關鍵詞:Sparse Matrix–Vector Multiplicationmatrix permutationGPUCUDAparallel computing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:265
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,由於稀疏矩陣經常被應用在科學與工程等領域,其中求解線性模型時經常出現大型的稀疏矩陣。使用ELLPACK格式儲存,可以減少矩陣儲存空間,但假設原矩陣中單一列有過多非零元素,仍會造成過多的記憶體空間浪費,而目前已有多數研究使用ELLPACK在圖形處理器(Graphic Processing Unit, GPU)上執行稀疏矩陣向量相乘(Sparse Matrix–Vector Multiplication,簡稱SpMV)。因此,本篇論文的研究採用舊有的方法RCM(Reverse Cutthill-McKee)演算法來集中稀疏矩陣的非零元素,此演算法主要著重於CSR(Compressed Sparse Row)格式儲存,用以減少稀疏矩陣的儲存空間,並細部調整索引值,藉此縮短在GPU上計算SpMV的時間。由於SpMV有著較低的資料使用比例,且其效能受制於記憶體頻寬,我們使用CSR儲存格式為基礎,並以下列兩種觀點探討:(1)減少cache miss強化向量密集程度且提高效能,(2)壓縮矩陣,存取非零元素的索引值來簡化對矩陣存取的次數,藉此使效能達到最佳化。
In recent years, large size sparse matrices are often used in fields such as science and engineering which usually apply in computing linear model. Using the ELLPACK format to store sparse matrices, it can reduce the matrix storage space. But if there is too much nonzero elements in one of row of the original sparse matrix, it still waste too much memory space. There are many research focusing on the Sparse Matrix–Vector Multiplication(SpMV)with ELLPACK format on Graphic Processing Unit(GPU). Therefore, the purpose of our research is reducing the access space of sparse matrix which is transformed in Compressed Sparse Row(CSR)format after Reverse Cutthill-McKee(RCM)algorithm to accelerate for SpMV on GPU. Due to lower data access ratio from SpMV, the performance is restricted by memory bandwidth. Our propose is based on CSR format from two aspects:(1)reduce cache misses to enhance the vector locality and raise the performance, and(2)reduce accessed matrix data by index reduction to optimize the performance.
中文摘要 I
英文摘要 II
致謝 III
目錄 IV
圖目錄 V
表目錄 VI
第一章 緒論 1
第二章 相關資訊 3
2.1 圖形處理單元(GPU) 3
2.2 統一計算架構(CUDA) 4
2.3 RCM ALGORITHM 9
2.4 CSR FORMAT 12
2.5 相關文獻 13
第三章 研究方法 16
3.1 主要概念 17
3.2 執行架構 21
3.3 演算法特性 24
第四章 實驗結果與討論 25
第五章 結論與未來展望 31
參考文獻 32
[1] NVIDIA GPU. (2012). WHAT IS GPU ACCELERATED COMPUTING?[Online]. Available: http://www.nvidia.com/object/what-is-gpu-computing.html
[2] Guillaume Colin de Verdière, "Introduction to GPGPU, a hardware and software background" , Comptes Rendus Mécanique, Volume 339, Issues 2–3, February–March 2011, Pages 78–89
[3] NVIDIA CUDA. (2012). CUDA Parallel Computing Platform[Online]. Available: http://www.nvidia.com/object/cuda_home_new.html
[4] T. R. Halfhill, "Parallel processing with CUDA-NVIDA’s highperformance computing platform uses massive multithreading, " Microprocessor Rep., pp. 1–8, Jan. 2008.
[5] NVIDIA. (2009). NVIDIA Cuda2.0 Programming Guide[Online]. Available:
http://developer.download.nvidia.com/compute/cuda/2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf 
[6] NVIDIA CUDA. (2014). CUDA C Programming Guide[Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
[7] Michał Czapiński, Stuart Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform”, J. Parallel Distrib. Comput. Vol 71, pp.802-811, 2011.
[8] S Park, SY Shin, KB Hwang, CFMDS: “CUDA-based Fast Multidimensional Scaling for Genome-scale Data”, BMC bioinformatics, 2012
[9] CUDA platform”, J. Parallel Distrib. Comput. Vol 71, pp.802-811, 2011.
[10] Chaofeng Hou, Ji Xu, Peng Wang, Wenlai Huang Xiaowei Wang, "Efficient GPU-accelerated molecular dynamics simulation of solid covalent crystals", Computer Physics Communications Volume 184, Issue 5, May 2013, pp. 1364–1371.
[11] Udagawa, T, Sekijima, M, "Energy consumption of GPU with molecular dynamics", Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pp. 40-44.
[12] Crespo, A.J.C , Dominguez, J.M, Valdez-Balderas, D, Rogers, B.D, Gomez-Gesteira, M, "Smoothed particle hydrodynamics on GPU computing", 2nd International Conference on Particle-Based Methods, PARTICLES 2011, pp. 922-929.
[13] Rustico, E, Bilotta, G, Gallo, G, Hérault, A, Del Negro, C, "Smoothed particle hydrodynamics simulations on multi-GPU systems", Proceedings - 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Article number6169576, pp. 384-391.
[14] Zhiyi Yang, Yating Zhu, Yong Pu, “Parallel Image Processing Based on CUDA”, Computer Science and Software Engineering, vol.3, pp.198~201, 2008.
[15] Cuthill, Elizabeth, and James McKee. "Reducing the bandwidth of sparse symmetric matrices." Proceedings of the 1969 24th national conference. ACM, 1969.
[16] A. George. Computer Implementation of the Finite Element Method. PhD thesis, 1971.
[17] F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A. R. Choudhury, and Y. Sabharwal, “Breaking the speed and scalability barriers for graph exploration on distributed-memory machines,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE, 2012, pp. 1–12.
[18] A. Buluc¸ and K. Madduri, “Parallel breadth-first search on distributed memory systems,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). ACM, 2011, pp. 65:1–65:12.
[19] Xu, Shiming, Wei Xue, and Hai Xiang Lin. "Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform." The Journal of Supercomputing (2013): 1-12.
[20] Bell, Nathan, and Michael Garland. "Implementing sparse matrix-vector multiplication on throughput-oriented processors." Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 2009.
[21] Neelima, B., G. Ram Mohana Reddy, and Prakash S. Raghavendra. "Predicting an optimal sparse matrix format for SpMV computation on GPU." Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. IEEE, 2014.
[22] Monakov, Alexander, Anton Lokhmotov, and Arutyun Avetisyan. "Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures." HiPEAC 5952 (2010): 111-125.
[23] Vazquez, Francisco, et al. "Improving the performance of the sparse matrix vector product with GPUs." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.
[24] Karakasis, Vasileios, et al. "An extended compression format for the optimization of sparse matrix-vector multiplication." IEEE Transactions on Parallel and Distributed Systems 24.10 (2013): 1930-1940.
[25] Matam, Kiran Kumar, and Kishore Kothapalli. "Accelerating sparse matrix vector multiplica-tion in iterative methods using GPU." Parallel Processing (ICPP), 2011 International Confer-ence on. IEEE, 2011.
[26] Saad, Youcef. "SPARSKIT: A basic tool kit for sparse matrix computations." (1990).
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊