|
[1] Y. El-Kurdi, D. Fern’andez, E. Souleimanov, D. Giannacopoulos, and W. J. Gross, “FPGA Architecture and Implementation of Sparse Matrix-Vector Multiplication for the Finite Element Method,” Computer Physics Communications, vol. 178, pp. 558–570, Apr. 2008. [2] V. Prasanna and G. Morris, “Sparse Matrix Computations on Reconfigurable Hardware,” Computer, vol. 40, no. 3, pp. 58–64, Mar. 2007. [3] A. Langville, C. Meyer, and P. Fern’andez, “Google’s Pagerank and Beyond: The Science of Search Engine Rankings,” The Mathematical Intelligencer, vol. 30, pp. 68–69, 2008. [4] V. Taylor, A. Ranade, and D. Messerschmitt, “SPAR: A New Architecture for Large Finite Element Computations,” IEEE Transactions on Computers, vol. 44, no. 4, pp. 531–545, Apr. 1995. [5] R. Vuduc, J. W. Demmel, K. A. Yelick, S. Kamil, R. Nishtala, and B. Lee, “Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply,” in Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, 2002, p. 26. [6] F. V’azquez, J. Fern’andez, and E. Garz’on, “A New Approach for Sparse Matrix Vector Product on NVIDIA GPUs,” Concurrency Computation Practice and Experience, vol. 23, no. 8, pp. 815–826, 2011. [7] Z. Wang, X. Xu, W. Zhao, Y. Zhang, and S. He, “Optimizing Sparse Matrix-Vector Multiplication on CUDA,” in 2nd International Conference on Education Technology and Computer (ICETC), vol. 4, Jun. 2010, pp. V4109–V4113. [8] S. Georgescu and H. Okuda, “Conjugate Gradients on Multiple GPUs,” International Journal for Numerical Methods in Fluids, vol. 64, no. 10-12, pp. 1254–1273, 2010. [9] A. El Zein and A. Rendell, “From Sparse Matrix to Optimal GPU CUDA Sparse Matrix Vector Product Implementation,” in 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), May 2010, pp. 808–813. [10] N. Kapre and A. DeHon, “Optimistic Parallelization of Floating-Point Accumulation,” in IEEE Symposium on Computer Arithmetic, Jun. 2007, pp. 205–216. [11] L. Benini and G. D. Micheli, “Networks on Chips: A New SoC Paradigm,”Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002. [12] D. Bertozzi and L. Benini, “Xpipes: A Network-on-Chip Architecture for Gigascale Systems-on-Chip,” IEEE Circuits and Systems Magazine, vol. 4, no. 2, pp. 18–31, Sep. 2004. [13] Y. Censor, D. Gordon, and R. Gordon, “Component Averaging: An Efficient Iterative Parallel Algorithm for Large and Sparse Unstructured Problems,” Parallel Computing, vol. 27, no. 6, pp. 777–808, 2001. [14] D.-H. Li, Y.-Y. Nie, J.-P. Zeng, and Q.-N. Li, “Conjugate Gradient Method for the Linear Complementarity Problem with S-Matrix,” Mathematical and Computer Modelling, vol. 48, pp. 918–928, Sep. 2008. [15] S. McGettrick, D. Geraghty, and C. McElroy, “An FPGA Architecture for the PageRank Eigenvector Problem,” in International Conference on Field Programmable Logic and Applications, Sep. 2008, pp. 523–526. [16] A. Kohler, G. Schley, and M. Radetzki, “Fault Tolerant Network on Chip Switching With Graceful Performance Degradation,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 6, pp. 883–896, Jun. 2010. [17] F. Vitullo, N. L’Insalata, E. Petri, S. Saponara, L. Fanucci, M. Casula, R. Locatelli, and M. Coppola, “Low-Complexity Link Microarchitecture for Mesochronous Communication in Networks-on-Chip,” IEEE Transactions on Computers, vol. 57, no. 9, pp. 1196–1201, Sep. 2008. [18] E. Rijpkema, K. Goossens, A. Radulescu, J. Dielissen, J. van Meerbergen, P. Wielage, and E. Waterlander, “Trade-offs in the Design of a Router with both Guaranteed and Best-Effort Services for Networks on Chip,” in Design, Automation and Test in Europe Conference and Exhibition, 2003, pp. 350–355. [19] K. S. Sainarayanan, C. Raghunandan, and M. Srinivas, “Delay and Power Minimization in VLSI Interconnects with Spatio-Temporal Bus-Encoding Scheme,”in IEEE Computer Society Annual Symposium on VLSI, Mar. 2007, pp. 401–408. [20] S. Pasricha and N. Dutt, On-Chip Communication Architectures: System on Chip Interconnect. Morgan Kaufmann, 2008. [21] A. B. Kahng, “Scaling: More than Moore’s law,” IEEE Design Test of Computers, vol. 27, no. 3, pp. 86–87, May 2010. [22] R. Saleh, S. Wilton, S. Mirabbasi, A. Hu, M. Greenstreet, G. Lemieux, P. Pande, C. Grecu, and A. Ivanov, “System-on-Chip: Reuse and Integration,” Proceedings of the IEEE, vol. 94, no. 6, pp. 1050–1069, june 2006. [23] K. Lee, S.-J. Lee, and H.-J. Yoo, “Low-Power Network-on-Chip for High-Performance SoC Design,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 2, pp. 148–160, Feb. 2006. [24] W. Wolf, “The Future of Multiprocessor Systems-on-Chips,” in Design Automation Conference, Jul. 2004, pp. 681–685. [25] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oeberg, M. Millberg, and D. Lindquist, “Network on a Chip: An Architecture for Billion Transistor Era,”in Proceeding of the IEEE NorChip Conference, Nov. 2000, pp. 24–31. [26] J. Hu and R. Marculescu, “DyAD - Smart Routing for Networks-on-Chip,” in Design Automation Conference, Jun. 2004, pp. 260–263. [27] J. Kim, C. Nicopoulos, D. Park, V. Narayanan, M. Yousif, and C. Das, “A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks,” in International Symposium on Computer Architecture, 2006, pp. 4–15. [28] J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach. Morgan Kaufmann, 2003. [29] Xilinx, “http://www.xilinx.com.” [30] Matrix Market, “http://math.nist.gov/MatrixMarket/,” National Institute of Standards and Technology (NIST).
|