|
[1] Yousef El-Kurdi,Warren J.Gross, andDennisGiannacopoulos“, Sparsematrix-vector multiplication for finite element method matrices on FPGAs,”in 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006. [2] A. Langville, C.Meyer, and P. Fernandez, “Google’s pagerank and beyond: The science of search engine rankings,”in The Mathematical Intelligencer, vol. 30, pp. 68-69, 2008. [3] Ingyu Lee, “Efficient sparse matrix vector multiplication using compressed graph,” in IEEE SoutheastCon, pp. 328-331, 2010. [4] L. Yavits, A. Morad and R. Ginosar,“Sparse matrix multiplication on an associative processor,”in IEEE Transactions on Parallel and Distributed Systems, 2013. [5] E. Im and K. Yelick, “Optimizing the performance of sparse matrix-vector multiplication,” in Proceedings of the 1999 ACM/IEEE conference on Superoompulting(CDROM), p.30. ACM,1999. [6] A. Pinar and M. Heath,“Improving performance of sparse matrix-vector multiplication,” in University of California, berkeley, 2000. [7] S. Williams et al,“Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms,”in Parallel Computing 35, no.3, pp. 178-194, 2009. [8] E. Im and K. Yelick, “Optimizing sparse matrix vector multiplication on smps,”in SIAM conference on parallel processing for scientific computing PPSC, 1999. [9] E. Im and K. Yelick, “Model-based memory hierarchy optimizations for sparse matrices,” in Workshop on Profile and Feedback-Directed Compilation, Oct. 1998. [10] R. Agarwal, F. Gustavson, and M. Zubair, “A high performance algorithm using pre-processing for the sparse matrix-vector multiplication,”in Proceedings of Supercomputing’ 92, Nov 1992, pp. 32-41. [11] R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. Philadelphia, PA: SIAM, 1994. [12] S. Toledo, “Improving memory-system performance of sparse matrixvector multiplication,” IBM Journal of Research and Development, vol. 41, no. 6, pp. 711–725, 1997. [13] R. Vuduc, J. Demmel, K. Yelick, S.K., R. Nishtala, and B. Lee, “Performance optimizations and bounds for sparse matrix-vector multiply,”in Proceedings of Supercomputing, 2002. [14] N. Bell and M. Garland, “Efficient sparse matrix-vector multiplication on CUDA,” NVIDIA Technical Report NVR-2008-3004, December 2008. [15] G. Morris and V. Prasanna, “Sparse matrix computations on reconfigurable hardware,” Compuuter, vol. 40, no. 3, pp. 58–64, March 2007. [16] J. Sun, G. Peterson and O. Storaasli, “Sparse matrix-vector design on FPGAs,”in Field-Programmable Custom Computing Machines, 15th Annual IEEE Symposium on FCCM, pp. 349–352, 2007. [17] L. Zhuo and V. Prasanna, “Sparse matrix-vector multiplication on FPGAs,” inproceeding of the 2005 ACM/ SIGDA 13th international symposium on Fieldprogrammaable gate aarrays, pp. 63–74 ACM, March 2005. [18] G. Qing, X. Guo, R. Patel, E. Ipek and E. Friedman,“AP-DIMM: Associative computing with STT-MRAM,”inISCA, 2013. [19] N. Kapre and A. DeHon,“Optimistic parallelization of floating-point accumulation,” inIEEE Symposium on Commputer Arithmetic, pp.205-216, Jun. 2007. [20] L. Benini andGD.Micheli“, Networks on chips: AnewSoC paradigm,”inComputer, vol. 35, no. 1, pp. 70-78, Jan. 2002. [21] D. Bertozzi and L. Benini,“Xpipes: A Network-in-Chip Architecture for gigascale Systems-on-Chip,”inIEEE Circuits and Systems Magazine, vol. 4, no. 2, pp. 18-31, Sep. 2004. [22] C.-C. Sun, J. Gotze, H.-Y. Jheng, and S.-J. Ruan,“Sparse matrix-vector multiplication on network-on-chip,”Advances in Radio Science, vol. 8, pp. 289–294, 2010. [23] H.-Y. Jheng, C.-C. Sun, S.-J. Ruan, and J. Gotze, “FPGA acceleration of sparse matrix-vector multiplication based on network-on-chip,”in 19th European Signal Processing Conference (EUSIPCO), Barcelona, August 2011, pp. 744–748. [24] A. Ogielski and W. Aiello, “Sparse matrix computations on parallel processor arrays,” SIAM J. SCI. COMPUT, vol. 14, pp. 519–530, 1992. [25] J. Lewis and R. van de Geijn, “Distributed memory matrix-vector multiplication and conjugate gradient algorithms,”in Supercomputing’93. Proceedings, 1993, pp. 484–492. [26] B. Hendrickson, R. Leland, and S. Plimpton, “An efficient parallel algorithm for matrix-vector multiplication,”International Journal of High Speed Computing, vol. 7, pp. 73–88, 1995. [27] U. Catalyurek and C. Aykanat, “A hypergraph-partitioning approach for coarsegrain decomposition,”in Supercomputing, ACM/IEEE 2001 Conference, Nov 2001, pp. 42–42. [28] R. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. Dongarra, “Matrix Market: A web resource for test matrix collections,”in The Quality of Numerical Software: Assessment and Enhancement, R. Boisvert, Ed. London: Chapman & Hall, 1997, pp. 125–137. [29] U. Catalyurek and C. Aykanat, “PaToH: A multilevel hypergraph partitioning tool, version 3.0,”Bilkent University, Department of Computer Engineering, Ankara, Turkey, Tech. Rep., 1999. [Online]. Available: http://bmi.osu.edu/ umit/software.html [30] A. Mansour and J. Gotze,“An OMNeT++ based network-on-chip simulator for embedded systems,”in IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2012), Kaohsiung, Taiwan, December 2012, pp. 364–367.
|