|
[1] D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler transformations for high-performance computing. Technical Report UCB/CSD-93-781, Computer Science Division, University of California, Berkeley, 1993. [2] D. F. Bacon et al., A Compiler Framework for Restructuring Data Declarations to enhance Cache and TLB Effectiveness. In CASCON'94, pp. 270-282, Apr. 1994. [3] K. Hwang and F. A. Briggs. Computer Architecture and Parallel Processing. McGRAW-Hill, Inc. 1984. [4] M. Lam, E. E. Rothberg, and M. E. Wolf., The Cache Performance of Blocked Algorithms. Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 63-74, Santa Clara Calif., 8-11 Apr. 1991. [5] A. R. Lebeck and D. A. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, Vol. 27, No. 10, pp. 15-26, 1994. [6] Jin-Ho Lee, Min-Young Lee, Seong-Uk Choi, and Myong-Soon Park. Reducing Cache Conflicts in Data Cache Prefetching. Computer Architecture News, vol. 22, no. 4, pp. 71-77, 1994. [7] L. S. Liu, C. W. Ho, and J. P. Sheu. On the parallelism of nested forllps using index shift method. In Proceedings of the International Conference on Parallel Processing, Vol. II, pp. 119-123, Aug. 1990. [8] Naraig Manjikian, and Tarek S. Abdelrahman. Reduction of cache conflicts in loop nests. Tech. Rep. CSRI-318, Computer Systems Research Institute, University of Toronto, Ontario, Canada, March 1995. [9] T. Mowry. Tolerating Latency through Software-controlled Data Prefetching. PHD Dissertation, Dept. of Electrical Eng., Stanford Univ., 1994. [10] P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, Vol. 48, No. 2, Feb. 1999. [11] S. Przybylski, M. Horowitz, and J.L. Hennessy. Performance Tradeoffs in Cache Design. Proc. 15th Symp. Computer Architecture, pp. 290-298, Honolulu, Hawaii, June 1988. [12] Gabriel Rivera, Chau-Wen Tesig. Data Transformations for Eliminating Conflict Misses. In Proceedings of the 1998 ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'98), Montreal, Canada, June 1998. [13] O. Temam, C. Fricker, and W. Jalby. Impact of cache interferences on usual numerical dense loop nests. Proceedings of the IEEE, Vol. 81, No. 8, pp. 1103-1115, 1993. [14] M.J. Wolfe. Iteration Space Tiling for Memory Hierarchies. Proc. Third SIAM Conf. Parallel Processing for Scientific Computing, pp. 357-361, Los Angeles, 1-4 Dec. 1987. [15] M. E. Wolf and M. S. LanO. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, pp. 30-44, June 1991. [16] David C. Wong, Edwar W. Davis, and Jeffrey O. Young. A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems. IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 6, pp. 601-608, June 1998.
|