 以多核心圖形處理器實現雅可比-大衛森演算法摘要　　Jacobi-Davidsons Method在求解大型稀疏特徵值問題時雖然有極佳的迭代收斂性，但近年來資料規模量逐漸變大，即便擁有極佳的迭代收斂性還是會花上大量得研究成本。因此使用圖形處理器(Graphics Processing Unit,GPU)以協同處理的方式降低研究成本就顯得更為重要。　　本論文探討如何以圖形處理器加速 Jacobi-Davidsons Method。 其中包含基本線性代數運算如矩陣相乘，向量內積和解大行稀疏線性系統，且分析在使用圖形處理器加速前後之效率。　　研究結果顯示，GPU 之執行結果為正確的。而基本線性代數運算中， GPU可將效率提升 1.95 ~ 4.638 倍，可見其效率提升。然而，整體的 Jacobi-Davidson Method 計算時間卻與 CPU 版本的相近，原因可能與記憶體搬移耗費過多時間，以及本實驗中所使用的 GPU 的計算時脈較低有關。
 Accelerating Jacobi-Davidson Method using Multi-core Graphics Processing UnitAbstract　　Jacobi-Davidson Method (JDM) has rapid iterative convergence insolving large sparse eigenvalue problems. However, due to the hugematrix size, we still have to spend a lot of research costs. This motivatesus to employ the graphics processing unit (GPU) to accelerate the JDM. Under the framework of Compute Unified Device Architecture(CUDA), some linear algebraic operations including matrix-matrixmultiplication, vector inner product and the computation of the solutionof the sparse linear system, are accelerated by using GPU. To evaluate theperformance of our code, we also perform these operations and overallJDM with and without GPU. The results show that the solutionscomputed by GPU are correct. Moreover, these linear algebraicoperations via GPU can gain 1.95~4.63 times speedup with respect toCPU version. However, the performance of overall JDM by using GPU iscomparable to those by CPU. This may be due to many extra worksregarding memory transfer in our GPU code, or slower clock rate in ourGPU.
 目錄第一章 緒論............................................1 1.1 研究背景........................................1 1.2 研究動機........................................2 1.3 研究目的........................................5 1.4 論文架構........................................6第二章　CUDA背景知識探討................................7 2.1　 CUDA.........................................7 2.1.1　CUDA平行化程式設計模型.................7 2.1.2　CUDA記憶體模型........................8 2.1.3　NVIDIA GeForce GT 740 M 硬體介紹.......10　　 2.1.4 CUDA Kernel...........................10 2.1.5 CUDA Runtime API......................12 2.1.6 __syncthread()函式......................13 2.2 　 CUDA平行化方法 .............................13第三章 Jacobi-Davidson Method平行化....................17 3.1　 Jacobi-Davidsons Method簡介.................17 3.2　 Jacobi-Davidsons Method.....................18 3.3　 CUDA平行化Jacobi-Davidsons Method............19第四章　實驗結果.........................................21 4.1　 實驗環境......................................21 4.2 實驗方式......................................21 4.3 實驗問題......................................22 4.4 矩陣乘向量....................................23 4.5 向量內積......................................25 4.6 Jacobi Method................................26 4.7 Jacobi-Davidsons Method......................27第五章　結論.............................................29參考文獻.................................................37圖目錄圖1-1 GPU 與 CPU 峰值浮點運算能力比較...................1圖1-2 GPU 與 CPU 記憶體頻寬比較.........................2圖1-3大量執行單元........................................3圖2-1 平行化程式設計模型..................................8圖2-2 CUDA 記憶體模型...................................9圖2-3平行化運算過程.....................................15圖4-1 GPU 與 CPU 矩陣乘向量時間折線圖..................24圖4-2 GPU與 CPU 向量內積時間折線圖.....................25圖4-3 GPU與 CPU Jacobi Method 時間折線圖.................26圖4-4 GPU與 CPU Jacobi-Davidsons Method 時間折線圖.......27表目錄表2-1優化前後之 GPU 和 CPU 時間比較表...................16表4-1 CPU 與 GPU 架構簡介.............................21表4-2 特徵值結果........................................23表4-3 CPU 與 GPU 矩陣乘向量效能之比較.................24表4-4 CPU 與 GPU 向量內積效能之比較...................25表4-5 CPU 與 GPU Jacobi Method 效能之比較..............26表4-6 CPU 與 GPU Jacobi-Davidsons Method 效能之較.......28
