臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.152) 您好！臺灣時間：2025/11/04 05:36

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

蔡松翰

研究生(外文):

Tsai,Sung-Han

論文名稱:

以NVIDIA CUDA架構為基礎之稀疏矩陣乘積計算最佳化

論文名稱(外文):

Optimization for sparse matrix-vector multiplication based on NVIDIA CUDA platform

指導教授:

魏凱城

指導教授(外文):

Wei, Kai-Cheng

口試委員:

魏凱城、楊朝棟、伍朝欽

口試委員(外文):

Wei, Kai-Cheng、Yang, Chao-Tung、Wu, Chao-Chin

口試日期:

2017-07-28

學位類別:

碩士

校院名稱:

國立彰化師範大學

系所名稱:

資訊工程學系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2017

畢業學年度:

105

語文別:

中文

論文頁數:

中文關鍵詞:

稀疏矩陣、向量乘積、矩陣排列、GPU、CUDA、平行計算

外文關鍵詞:

Sparse Matrix–Vector Multiplication、matrix permutation、GPU、CUDA、parallel computing

相關次數:

被引用:0
點閱:265
評分:
下載:0
書目收藏:0

近年來，由於稀疏矩陣經常被應用在科學與工程等領域，其中求解線性模型時經常出現大型的稀疏矩陣。使用ELLPACK格式儲存，可以減少矩陣儲存空間，但假設原矩陣中單一列有過多非零元素，仍會造成過多的記憶體空間浪費，而目前已有多數研究使用ELLPACK在圖形處理器(Graphic Processing Unit, GPU)上執行稀疏矩陣向量相乘（Sparse Matrix–Vector Multiplication，簡稱SpMV）。因此，本篇論文的研究採用舊有的方法RCM（Reverse Cutthill-McKee）演算法來集中稀疏矩陣的非零元素，此演算法主要著重於CSR（Compressed Sparse Row）格式儲存，用以減少稀疏矩陣的儲存空間，並細部調整索引值，藉此縮短在GPU上計算SpMV的時間。由於SpMV有著較低的資料使用比例，且其效能受制於記憶體頻寬，我們使用CSR儲存格式為基礎，並以下列兩種觀點探討：(1)減少cache miss強化向量密集程度且提高效能，(2)壓縮矩陣，存取非零元素的索引值來簡化對矩陣存取的次數，藉此使效能達到最佳化。

In recent years, large size sparse matrices are often used in fields such as science and engineering which usually apply in computing linear model. Using the ELLPACK format to store sparse matrices, it can reduce the matrix storage space. But if there is too much nonzero elements in one of row of the original sparse matrix, it still waste too much memory space. There are many research focusing on the Sparse Matrix–Vector Multiplication（SpMV）with ELLPACK format on Graphic Processing Unit（GPU）. Therefore, the purpose of our research is reducing the access space of sparse matrix which is transformed in Compressed Sparse Row（CSR）format after Reverse Cutthill-McKee（RCM）algorithm to accelerate for SpMV on GPU. Due to lower data access ratio from SpMV, the performance is restricted by memory bandwidth. Our propose is based on CSR format from two aspects:（1）reduce cache misses to enhance the vector locality and raise the performance, and（2）reduce accessed matrix data by index reduction to optimize the performance.

中文摘要 I
英文摘要 II
致謝 III
目錄 IV
圖目錄 V
表目錄 VI
第一章緒論 1
第二章相關資訊 3
2.1 圖形處理單元(GPU) 3
2.2 統一計算架構(CUDA) 4
2.3 RCM ALGORITHM 9
2.4 CSR FORMAT 12
2.5 相關文獻 13
第三章研究方法 16
3.1 主要概念 17
3.2 執行架構 21
3.3 演算法特性 24
第四章實驗結果與討論 25
第五章結論與未來展望 31
參考文獻 32

[1] NVIDIA GPU. (2012). WHAT IS GPU ACCELERATED COMPUTING?[Online]. Available: http://www.nvidia.com/object/what-is-gpu-computing.html
[2] Guillaume Colin de Verdière, "Introduction to GPGPU, a hardware and software background" , Comptes Rendus Mécanique, Volume 339, Issues 2–3, February–March 2011, Pages 78–89
[3] NVIDIA CUDA. (2012). CUDA Parallel Computing Platform[Online]. Available: http://www.nvidia.com/object/cuda_home_new.html
[4] T. R. Halfhill, "Parallel processing with CUDA-NVIDA’s highperformance computing platform uses massive multithreading, " Microprocessor Rep., pp. 1–8, Jan. 2008.
[5] NVIDIA. (2009). NVIDIA Cuda2.0 Programming Guide[Online]. Available:
http://developer.download.nvidia.com/compute/cuda/2_0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf　
[6] NVIDIA CUDA. (2014). CUDA C Programming Guide[Online]. Available: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
[7] Michał Czapiński, Stuart Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform”, J. Parallel Distrib. Comput. Vol 71, pp.802-811, 2011.
[8] S Park, SY Shin, KB Hwang, CFMDS: “CUDA-based Fast Multidimensional Scaling for Genome-scale Data”, BMC bioinformatics, 2012
[9] CUDA platform”, J. Parallel Distrib. Comput. Vol 71, pp.802-811, 2011.
[10] Chaofeng Hou, Ji Xu, Peng Wang, Wenlai Huang Xiaowei Wang, "Efficient GPU-accelerated molecular dynamics simulation of solid covalent crystals", Computer Physics Communications Volume 184, Issue 5, May 2013, pp. 1364–1371.
[11] Udagawa, T, Sekijima, M, "Energy consumption of GPU with molecular dynamics", Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pp. 40-44.
[12] Crespo, A.J.C , Dominguez, J.M, Valdez-Balderas, D, Rogers, B.D, Gomez-Gesteira, M, "Smoothed particle hydrodynamics on GPU computing", 2nd International Conference on Particle-Based Methods, PARTICLES 2011, pp. 922-929.
[13] Rustico, E, Bilotta, G, Gallo, G, Hérault, A, Del Negro, C, "Smoothed particle hydrodynamics simulations on multi-GPU systems", Proceedings - 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Article number6169576, pp. 384-391.
[14] Zhiyi Yang, Yating Zhu, Yong Pu, “Parallel Image Processing Based on CUDA”, Computer Science and Software Engineering, vol.3, pp.198~201, 2008.
[15] Cuthill, Elizabeth, and James McKee. "Reducing the bandwidth of sparse symmetric matrices." Proceedings of the 1969 24th national conference. ACM, 1969.
[16] A. George. Computer Implementation of the Finite Element Method. PhD thesis, 1971.
[17] F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A. R. Choudhury, and Y. Sabharwal, “Breaking the speed and scalability barriers for graph exploration on distributed-memory machines,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE, 2012, pp. 1–12.
[18] A. Buluc¸ and K. Madduri, “Parallel breadth-first search on distributed memory systems,” in International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). ACM, 2011, pp. 65:1–65:12.
[19] Xu, Shiming, Wei Xue, and Hai Xiang Lin. "Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform." The Journal of Supercomputing (2013): 1-12.
[20] Bell, Nathan, and Michael Garland. "Implementing sparse matrix-vector multiplication on throughput-oriented processors." Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, 2009.
[21] Neelima, B., G. Ram Mohana Reddy, and Prakash S. Raghavendra. "Predicting an optimal sparse matrix format for SpMV computation on GPU." Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. IEEE, 2014.
[22] Monakov, Alexander, Anton Lokhmotov, and Arutyun Avetisyan. "Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures." HiPEAC 5952 (2010): 111-125.
[23] Vazquez, Francisco, et al. "Improving the performance of the sparse matrix vector product with GPUs." Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. IEEE, 2010.
[24] Karakasis, Vasileios, et al. "An extended compression format for the optimization of sparse matrix-vector multiplication." IEEE Transactions on Parallel and Distributed Systems 24.10 (2013): 1930-1940.
[25] Matam, Kiran Kumar, and Kishore Kothapalli. "Accelerating sparse matrix vector multiplica-tion in iterative methods using GPU." Parallel Processing (ICPP), 2011 International Confer-ence on. IEEE, 2011.
[26] Saad, Youcef. "SPARSKIT: A basic tool kit for sparse matrix computations." (1990).

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	平行粒子群優法應用於高維度影像特徵抽取
2.	利用GPU加速SIFT特徵之擷取與比對
3.	以GPU實現最鄰近特徵向量空間演算法應用於高光譜影像分類
4.	CUDA平行加速運算技術應用於光學薄膜特性之摹擬
5.	一個平行化Retinex演算法之實現與分析
6.	利用圖形處理器加速射線追蹤演算法應用於醫學成像之研究
7.	在GPU上實作平行處理Bzip2資料壓縮演算法
8.	平行處理資料降維技術於多核心圖形運算處理器
9.	以GPU和OpenGL視覺化模擬台灣沿海海嘯傳播
10.	圖形處理器平行計算技術應用於影像處理及空間桁架結構最佳化之研究
11.	利用圖形處理器加速 K-Means 影像分群之研究
12.	在GPU平台上以CSR稀疏矩陣轉置之平行演算法設計
13.	使用GPU加速哼唱選歌比對
14.	利用繪圖處理器平行運算技術計算電子電洞交換能
15.	藉由GPU之大量執行緒以加速AC類型的字串比對演算法

無相關期刊

1.	微型零售業導入行動商務之研究
2.	男性街友家庭關係變化歷程之研究
3.	IEEE 802.11ax的省電機制設計
4.	基於直方圖藏密的多次藏密可回復資訊隱藏技術
5.	基於進階利用修改方向使用雙偽裝影像之可逆式資訊隱藏
6.	彰化縣公立國民小學學童親子互動關係與情緒管理之相關研究
7.	咖啡業者經營成功關鍵因素之研究
8.	吲哚-3-乙酸誘導鏈帶藻的表型可塑性
9.	具機器手臂之輪型機器人
10.	CNC車床主軸結構之變形預測與分析
11.	XXY平台之定位誤差補償控制與影像驗證
12.	大學諮商心理師接受轉介個案的服務經驗
13.	使用CNTFET元件的內容可定址記憶體之設計
14.	以GPGPU加速訓練卷積濾波器權重之研究
15.	一個無使用次級數位至類比轉換器結構的簡明管線式類比至數位轉換器架構之設計

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室