|
[1] Y. H. Hu, Programmable Digital Signal Processors – Architecture, Programming, and Applications, Marcel Dekker Inc., 2002 [2] R.B. Lee, “Multimedia extensions for general-purpose processors,” in Proc. IEEE Workshop Signal Processing Systems, pp. 9-23, Nov.1997. [3] K. Diefendorff, P.K. Dubey, R. Hochsprung, and H. Scales, “AltiVec extension to PowerPC accelerates media processing,” IEEE Micro, vol. 20, no. 2, pp. 85-95, Mar./Apr. 2000. [4] J.A Kahle et al., “Introduction to the Cell multiprocessor,” IBM J. Research and Development, vol. 49, no. 4/5, July 2005, pp.589-604 [5] The Cell architecture. [Online]. Available: http://domino.watson.ibm.com/comm/research.nsf/pages/r.arch.innovation.html [6] Cell Broadband Engine Programming Handbook version 1.1, IBM, 2007 [7] B. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Gervais, R. Kim, T. Le, et. al., "The microarchitecture of the synergistic processor for a Cell processor," IEEE J. Solid State Circuits 41, No. 1, 63-70 (2006). [8] Synergistic Processor Unit Instruction Set Architecture, Version 1.2, IBM Corporation, Sony Computer Entertainment Corporation, and Toshiba Corporation. [Online]. http://www.ibm.com/chips/techlib/techlib.nsf/techdecs/ 76CA6C7304210F3987257060006F2C44/$file/ SPU_ISA_v1.2_27Jan2007_pub.pdf. [9] J. Leenstra et al., “The vector fixed point unit of the streaming processor of a CELL processor,” presented at the Symp. VLSI Circuits, Kyoto, Japan, 2005. [10] H. Oh et al., “A fully-pipelined single-pipelined single-precision floating point unit in the streaming processing unit of a CELL processor,” presented at the Symp. VLSI Circuits, Kyoto, Japan, 2005. [11] S. Krithivasan and M.J. Schutle, “Multiplier Architecture for Media Processing,” in Proc. 37th Asilomar Conf. Signals, Systems, and Computers, pp. 2193-2197, Nov. 2003 [12] Suzuki, K. et al.,”A 2000-MOPS embedded RISC processor with a Rambus DRAM controller,” IEEE J. Solid-State Circuit, vol. 34, pp. 1010-1021, 1999 [13] A. Terechko, M. Garg, and H. Corporaal, “Evaluation of speed and area of clustered VLIW Processors,” in Proc. VLSID, pp.557-563, 2005 [14] P.C. Hsiao, T. J. Lin, C. W. Liu, and C. W. Jen, “Efficient datapath design for clustered &pipelined digital signal processors,” in Proc. VLSI design/CAD, Aug. 2005 [15] C. Leiserson, F. Rose, and J. Saxe, “Optimizing synchronous circuitry by retiming,” in Third Caltech Conference on VLSI, pp. 87-116, 1983 [16] C. Leiserson, F. Rose, and J. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol.6, pp. 5-35, 1911 [17] K. K. Parhi, VLSI Digital Signal Processing Systems – Design and Implementation, John Wiley & Sons, 1999 [18] A. Hoffmann, H. Meyr, and R. Leupers, Architecture Exploration for Embedded Processors with LISA, Kluwer Academic Publishers, 2002
|