|
[1] Cullum JK and Willoughby RA (1985) Lanczos Algorithms for Large Symmetric Eignenvalue Computations. Birkhauser, Boston. [2] Golub GH and Loan CFV (1989) Matrix Computations, 2nd Edition, The John Hopkins University Press, Baltimore, Maryland 21218. [3] Duff I, Grimes R, and Lewis J. (1989) Sparse matrix test problems. ACM Trans Math Softw, 15(1): 1-14. [4] McKinley KS, Carr S, and Tseng CW (1996) Improving Data Locality with Loop Transformations. ACM Trans Program Lang Syst, 18(4): 424-453. [5] Lin CY, Liu JS, and Chung YC (2002) Efficient Representation Scheme for Multi-Dimensional Array Operations. IEEE Trans Comput, 51(3):327-345. [6] Chambers JE, Wilkinson PB, Kuras O et al (2011) Three-dimensional geophysical anatomy of an active landslide in Lias Group mudrocks, Cleveland Basin, UK. Geomorphology, 125(4):472-484. [7] Gateau J, Caballero MAA, Dima A et al (2013) Three-dimensional optoacoustic tomography using a conventional ultrasound linear detector array: Whole-body tomographic system for small animals. Med. Phys. 40, 013302. [8] Lin CY, Chung YC, and Liu JS (2003) Efficient Data Compression Methods for Multi-Dimensional Sparse Array Operations Based on the EKMR Scheme. IEEE Trans Comput, 52(12):1640-1646. [9] Harwell-Boeing Collection. http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/. [10] Barrett R, Berry M, Chan TF et al (1994) Templates for the Solution of Linear Systems: Building Blocks for the Iterative Methods, 2nd Edition, SIAM. [11] Lin CY, Chung YC, and Liu JS (2003) Efficient Data Parallel Algorithms for Multi-Dimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers. IEEE Trans Parall Distr, 14(7): 624-639. [12] Chang RG, Chung TR, and Lee JK (2001) Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90. J. Supercomputing, 18(3):304-339. [13] Lin CY and Chung YC (2007) Data Distribution Schemes of Sparse Arrays on Distributed Memory Multicomputers. J. Supercomputing, 41(1):63-87 [14] Lin CY and Chung YC (2007) Efficient Data Distribution Schemes for Multi-Dimensional Sparse Arrays. J INF SCI ENG, 23(1):314-327. [15] Oliver T, Schmidt B, and Maskell DL (2005) Reconfigurable architectures for bio-sequence database scanning on FPGAs. IEEE Trans Circuits Syst II, 52:851–855. [16]Szalkowski A, Ledergerber C, Krahenbuhl P et al (2008) SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2. BMC Research Notes, 1:107. [17] Liu W, Schmidt B, Voss G et al (2006) Bio-Sequence Database Scanning on a GPU. 10.1109/IPDPS.2006.1639531. [18] Hsu WS, Hung C.L, Lin CY et al (2013) Efficient Strategy for Compressing Sparse Matrices on Graphics Processing Units. 10.1109/ICCPS.2013.6893496. [19] Intel Corporation, Intel R Xeon PhiTM Coprocessor Instruction Set Architecture Reference Manual. September 2012, reference number 327364-001. [20] Cramer T, Schmidl D, Klemm K et al (2012) OpenMP Programming on Intel R Xeon Phi TM Coprocessors: An Early Performance Comparison. http://www.lfbs.rwth-aachen.de/marc2012/07_Cramer.pdf. [21] Liu X, Smelyanskiy M, Chow E et al (2013) Efficient sparse matrix-vector multiplication on x86-based many-core processors. 10.1145/2464996.2465013. [22] Saule E, Kaya K and Catalyurek UV (2014) Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi. 10.1007/978-3-642-55224-3_52. [23] Cierniak M and Li W. (1994) Unifying Data and Control Transformations for Distributed Shared Memory Machines. Technical Report. [24] Press WH, Teukolsky SA, Vetterling WT et al (1996) Numerical Recipes in Fortran 90: The Art of Parallel Scientific Computing. Cambridge University Press. [25] D. Horn, “Stream reduct o operat o s for GPGPU applications,” In GPU Gems 2, M. Pharr, Ed. Addison Wesley, Reading, MA, Chap. 36, 573–589. [26]G.E. Blelloch, “Prefix Sums a d The r Applications”. I J.H. Reif, editor, Synthesis of Parallel Algorithm, Morgan Kaufmann, 1993.
|