跳到主要內容

臺灣博碩士論文加值系統

(34.204.198.73) 您好!臺灣時間:2024/07/19 14:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:顏煌庭
研究生(外文):Huang Ting Yen
論文名稱:基於英特爾多核心處理器與協同處理器的環境底下有效率的三維稀疏矩陣壓縮策略
論文名稱(外文):Efficient Strategies of compressing Three-Dimensional Sparse Arrays based on Intel XEON and Intel XEON Phi Environment
指導教授:林俊淵
指導教授(外文):C. Y. Lin
學位類別:碩士
校院名稱:長庚大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:56
中文關鍵詞:矩陣運算稀疏矩陣的壓縮方法平行處理多核心處理器
外文關鍵詞:Array operationsparse array compression methodparallel processingmultiprocessoraccelerator
相關次數:
  • 被引用被引用:0
  • 點閱點閱:110
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
稀疏矩陣的運算常常是很多科學運算的核心,如分子動力學、有限元素法、氣象學…等,且由於近年來隨著科技進步的發展,有越來越多三維稀疏矩陣的應用也被提出並探討,如地質分析與三維的影像處理等等。在這些不同領域的研究中,矩陣的運算往往需要花很多時間處理。過去有許多研究指出,在進行矩陣運算前,壓縮稀疏矩陣以避免不必要的計算是不可避免的。對於許多實際應用稀疏矩陣的運算上,如何有效率的壓縮稀疏矩陣是很重要的議題,平行處理是一個很適合解決這問題的方法,譬如透過多核心、多台電腦或加速器來提高矩陣壓縮時的效能。因此,在本篇論文中提出了一個有效率的三維稀疏矩陣壓縮策略,在Intel Xeon(多核心)與Intel Xeon Phi(加速器)上,分別任務間平行與任務內部平行。這兩種不同的策略分別針對不同的三維稀疏矩陣應用所設計。在最後實驗結果中,我們在Intel Xeon E5-2670 v2與Intel Xeon Phi SE10X上,透過任務間的平行策略,分別取得了16倍與18倍的效能;透過任務內部的平行策略,分別取得了4倍與5倍的效能。
Array operations are useful in a lot of important scientific codes, such as molecular dynamics, finite-element methods, atmosphere and ocean sciences, and etc. In recent years, more and more applications, such as geological analysis and images processing, are solved and processed by using array operations for three-dimensional (abbreviate to 3D) sparse arrays. Due to the huge computation time, it is necessary to compress the sparse arrays in order to avoid unnecessary computations. Parallel processing is also a suitable solution to speed up the array operations based on multiprocessor, multicomputer and accelerator. How to compress the sparse arrays efficiently is the first task of designing parallel algorithm for practical applications with sparse array operations. Hence, efficient strategies of compressing 3D sparse arrays based on Intel XEON (multiprocessor) and Intel XEON Phi (accelerator) environments are proposed in this paper. For each environment, two strategies, inter-task and intra-task parallelization, are presented to compress a series of sparse arrays and single large sparse array, respectively. From experimental results, the inter-task parallelization strategy achieves 16x and 18x speedup ratios based on Intel XEON E5-2670 v2 and Intel Xeon Phi SE10X, respectively; 4x and 5x speedup ratios, respectively, for the intra-task parallelization strategy.
目錄
指導教授推薦書
口試委員會審定書
誌謝 iii
中文摘要 iv
Abstract v
目錄 vi
圖目錄 viii
表目錄 x
第一章 介紹 - 1 -
第二章 前題概念 - 7 -
2.1二維或三維稀疏矩陣的CRS壓縮 - 7 -
2.2在GPU上實作二維稀疏矩陣壓縮的平行演算法 - 10 -
2.3 Intel Xeon和Intel Xeon Phi基本介紹 - 11 -
第三章 研究方法 - 14 -
3.1三維矩陣的壓縮過程 - 15 -
3.2 ETP 的策略 - 16 -
3.3 RTP 的策略 - 19 -
第四章 實驗結果 - 24 -
4.1 ETP策略的實驗結果 - 27 -
4.2 RTP策略的實驗結果 - 35 -
第五章 結論 - 41 -
參考文獻 - 43 -



圖目錄
圖2 - 1 二維稀疏矩陣 - 8 -
圖2 - 2二維矩陣的CRS壓縮格式 - 8 -
圖2 - 4 三維稀疏矩陣 CRS CRS - 9 -
圖2 - 3 三維稀疏矩陣 - 9 -
圖2 - 5 Intel Xeon Phi 架構 - 12 -
圖3 - 1 Pseudo code - 16 -
圖3 - 2 ETP的工作分配示意圖 - 17 -
圖3 - 3 Scatter Affinity示意圖 - 18 -
圖3 - 4 ETP Pseudo code - 18 -
圖3 - 5 RTP策略中資料的切割示意圖 - 20 -
圖3 - 6 RTP策略中資料的壓縮示意圖 - 21 -
圖3 - 7 RTP策略中資料的整合示意圖 - 23 -
圖3 - 8 RTP Pseudo code - 23 -
圖4 - 1 (a) the speedup ratios for 244 test 3D sparse arrays with size of 180×180×180 by the ETP strategy on the Intel Xeon E5-2670 v2 - 28 -
圖4 - 1 (b) the speedup ratios for 244 test 3D sparse arrays with size of 180×180×180 by the ETP strategy on the Intel Xeon Phi SE10X. - 29 -
圖4 - 1 (c) the speedup ratios for 244 test 3D sparse arrays with size of 180×180×180 by the ETP strategy on the Intel Xeon Phi SE10X. - 29 -
圖4 - 2 (a) the speedup ratios for test 3D sparse arrays with sizes of 180×180×180 by the ETP strategy on the Intel Xeon E5-2670 - 30 -
圖4 - 2 (b) the speedup ratios for test 3D sparse arrays with sizes of 120×120×120 by the ETP strategy on the Intel Xeon E5-2670 - 31 -
圖4 - 2 (c) the speedup ratios for test 3D sparse arrays with sizes of 60×60×60 by the ETP strategy on the Intel Xeon E5-2670 - 31 -
圖4 – 3 the speedup ratios for test 3D sparse arrays with sizes of 180×180×180、120×120×120 and 60×60×60 by the ETP strategy on the Intel Xeon Phi SE10X - 34 -
圖4 – 4 the speedup ratios for single test 3D sparse arrays with sizes of 768×768×768、976×976×976 and 1024×1024×1024 by the RTP strategy on the Intel Xeon E5-2670 v2 and the Intel Xeon Phi SE10X. - 36 -
圖4 – 5 the speedup ratios for test 3D sparse arrays with sizes of 768×768×768,976×976×976 and 1024×1024×1024 by the RTP strategy on the Intel Xeon E5-2670 v2. - 38 -
圖4 – 6 the speedup ratios for test 3D sparse arrays with sizes of 768×768×768,976×976×976 and 1024×1024×1024 by the RTP strategy on the Intel Xeon Phi SE10X. - 40 -

表目錄
表4 - 1 - 24 -
表4 - 2 - 26 -
表4 - 3 - 27 -


[1] Cullum JK and Willoughby RA (1985) Lanczos Algorithms for Large Symmetric Eignenvalue Computations. Birkhauser, Boston.
[2] Golub GH and Loan CFV (1989) Matrix Computations, 2nd Edition, The John Hopkins University Press, Baltimore, Maryland 21218.
[3] Duff I, Grimes R, and Lewis J. (1989) Sparse matrix test problems. ACM Trans Math Softw, 15(1): 1-14.
[4] McKinley KS, Carr S, and Tseng CW (1996) Improving Data Locality with Loop Transformations. ACM Trans Program Lang Syst, 18(4): 424-453.
[5] Lin CY, Liu JS, and Chung YC (2002) Efficient Representation Scheme for Multi-Dimensional Array Operations. IEEE Trans Comput, 51(3):327-345.
[6] Chambers JE, Wilkinson PB, Kuras O et al (2011) Three-dimensional geophysical anatomy of an active landslide in Lias Group mudrocks, Cleveland Basin, UK. Geomorphology, 125(4):472-484.
[7] Gateau J, Caballero MAA, Dima A et al (2013) Three-dimensional optoacoustic tomography using a conventional ultrasound linear detector array: Whole-body tomographic system for small animals. Med. Phys. 40, 013302.
[8] Lin CY, Chung YC, and Liu JS (2003) Efficient Data Compression Methods for Multi-Dimensional Sparse Array Operations Based on the EKMR Scheme. IEEE Trans Comput, 52(12):1640-1646.
[9] Harwell-Boeing Collection. http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/.
[10] Barrett R, Berry M, Chan TF et al (1994) Templates for the Solution of Linear Systems: Building Blocks for the Iterative Methods, 2nd Edition, SIAM.
[11] Lin CY, Chung YC, and Liu JS (2003) Efficient Data Parallel Algorithms for Multi-Dimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers. IEEE Trans Parall Distr, 14(7): 624-639.
[12] Chang RG, Chung TR, and Lee JK (2001) Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90. J. Supercomputing, 18(3):304-339.
[13] Lin CY and Chung YC (2007) Data Distribution Schemes of Sparse Arrays on Distributed Memory Multicomputers. J. Supercomputing, 41(1):63-87
[14] Lin CY and Chung YC (2007) Efficient Data Distribution Schemes for Multi-Dimensional Sparse Arrays. J INF SCI ENG, 23(1):314-327.
[15] Oliver T, Schmidt B, and Maskell DL (2005) Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans Circuits Syst II, 52:851–855.
[16]Szalkowski A, Ledergerber C, Krahenbuhl P et al (2008) SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2. BMC Research Notes, 1:107.
[17] Liu W, Schmidt B, Voss G et al (2006) Bio-Sequence Database Scanning on a GPU. 10.1109/IPDPS.2006.1639531.
[18] Hsu WS, Hung C.L, Lin CY et al (2013) Efficient Strategy for Compressing Sparse Matrices on Graphics Processing Units. 10.1109/ICCPS.2013.6893496.
[19] Intel Corporation, Intel R Xeon PhiTM Coprocessor Instruction Set Architecture Reference Manual. September 2012, reference number 327364-001.
[20] Cramer T, Schmidl D, Klemm K et al (2012) OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison. http://www.lfbs.rwth-aachen.de/marc2012/07_Cramer.pdf.
[21] Liu X, Smelyanskiy M, Chow E et al (2013) Efficient sparse matrix-vector multiplication on x86-based many-core processors. 10.1145/2464996.2465013.
[22] Saule E, Kaya K and Catalyurek UV (2014) Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi. 10.1007/978-3-642-55224-3_52.
[23] Cierniak M and Li W. (1994) Unifying Data and Control Transformations for Distributed Shared Memory Machines. Technical Report.
[24] Press WH, Teukolsky SA, Vetterling WT et al (1996) Numerical Recipes in Fortran 90: The Art of Parallel Scientific Computing. Cambridge University Press.
[25] D. Horn, “Stream reduct o operat o s for GPGPU applications,” In GPU Gems 2, M. Pharr, Ed. Addison Wesley, Reading, MA, Chap. 36, 573–589.
[26]G.E. Blelloch, “Prefix Sums a d The r Applications”. I J.H. Reif, editor, Synthesis of Parallel Algorithm, Morgan Kaufmann, 1993.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top