|
[1]C. R. Baugh, and B. A. Wooley, "A two’s complement parallel array multiplication algorithm," IEEE Transaction on Computers, vol. C-22, pp. 1045-1047, 1973. [2]F. Bensaali, A. Amira, and A. Bouridane, "Accelerating matrix product on reconfigurable hardware for image processing applications," IEE Proceedings of Circuits, Devices and Systems, vol. 152, no. 3, pp. 236-246, 2005. [3]G. Choe and E. E. Swartzlander, Jr., "Merged Arithmetic for computing wavelet transforms", in Proceedings of the 8th Great Lakes Symposium on VLSI, 1998, pp. 196-201. [4]G. Choe and E. E. Swartzlander, Jr., "Complexity of merged two’s complement multiplier-adders," in Proceedings of the 35th IEEE Midwest Symposium on Circuits and Systems, 1999, vol. 1, pp. 384-387. [5]L. Dadda, "Some schemes for parallel multipliers," Alta Frequenza, vol. 34, pp. 349-356, 1965. [6]A. Fayed, W. Elgharbawy and M. Bayoumi, "A Data Merging Technique High-Speed Low-Power Multiply Accumulate Units, " in Procceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2004, pp. V- 145-8. [7]K. A. Feiste and E. E. Swartzlander, Jr., "High-speed VLSI implementation of FIR lattice filters," in Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers, 1995, pp. 127-131. [8]K. A. Feiste and E. E. Swartzlander, Jr., "High-speed VLSI implementation of IIR lattice filters," in Proceedings of the 30th Asilomar Conference on Signals, Systems and Computers, 1996, pp. 1057-1062. [9]K. A. Feiste and E. E. Swartzlander, Jr., "Merged arithmetic revisited," in Proceedings of the IEEE Workshop on Signal Processing Systems, 1997, pp. 212-221. [10]J. Gu, C.-H. Chang and K.-S. Yeo, "Algorithm and Architecture for a High Density, Low Power Scalar Product Macrocell," IEE Proceedings on Computer Digital Technology, vol. 151, no. 2, pp. 161-172, 2004. [11]R. S. Grover, W. Shang, and Q. Li, "Bit-level two’s complement matrix multiplication," Integration, the VLSI Journal, vol. 33, no. 1, pp. 3-21, 2002. [12]K. Hwang and F. A. Briggs, Computer Architecture and Parallel Processing, McGraw-Hill, New York, 1984. [13]H.-P. Huang and D.-R. Duh, "Fast computation algorithm for robot dynamics and its implementation," in Proceedings of the IEEE International Symposium on Industrial Electronics, 1992, pp. 352-356. [14]D. L. Jones, Fixed-Point Number Representation, Connexions Web site. http://cnx.org/content/m11930/1.2/, Dec 28, 2004. [15]J.-W. Jang, S. B. Choi, and V. K. Prasanna, "Energy- and time-efficient matrix multiplication on FPGAs," IEEE Transaction on Very Large Scale Integration Systems, vol. 13, no. 11, pp. 1305-1319, November 2005. [16]R. Lin, "A reconfigurable low-power high-performance matrix multiplier design," in Proceedings of the IEEE First International Symposium on Quality Electronic Design, 2000, pp. 321-328. [17]E. L. Leiss, Parallel and Vector Computing, McGraw-Hill, New York, 1995. [18]B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, Oxford Univ. Press, New York, 2000. [19]V. Y. Pan, "How can we speed-up matrix multiplication?" SIAM Review, vol. 26, no. 3, pp.393-415, 1984. [20]R. Scrofano, S. Choi and V. K. Prasanna, "Energy Efficiency of FPGAs and Programmable Processors for Matrix Multiplication", in Proceedings of the IEEE International Conference on Field-Programmable Technology, 2002, pp. 422-425. [21]E. E. Swartzlander, Jr., "Merged arithmetic," IEEE Transaction on Computers, vol. C-29, no. 10, pp. 946-950, October 1980. [22]C. S. Wallace, "A suggestion for a fast multiplier", IEEE Transaction on Electronic Computing, vol. EC-13, pp. 14-17, 1964. [23]Z. Ye and C.-H. Chang, "A hybrid CSA tree for merged arithmetic architecture of FIR filter," in Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003, pp. 449-453. [24]L. Zhuo and V. K. Prasanna, "High performance linear algebra operations on reconfigurable systems," in Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005.
|