|
[1]S.V. Adve, V.S. Pai, and P. Ranganathan, "Recent Advances in Memory Consistency Models for Hardware Shared-Memory Systems," Proceedings of the IEEE, Vol. 89, No. 3, March 1999, pp. 445-455. [2]A. Agarwal, "Performance Tradeoffs in Multithreaded Processors," IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No. 5, September 1992, pp. 525-539. [3]D. Alpert and D. Avnon, "Architecture of the Pentium Microprocessor," IEEE Micro, Vol. 13, No. 3, June 1993, pp. 11-21. [4]T. Asprey, G.S. Averill, E. DeLano, R. Mason, B. Weiner and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, Vol. 13, No. 3, June 1993, pp. 22-35. [5]D.H. Bailey, "FFT''s in External or Hierarchical Memory," Journal of Supercomputing, Vol. 4, No. 10, March 1990, pp. 23-35. [6]G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S.J. Smith, and M. Zagha, "A Comparison of Sorting Algorithms for the Connection Machine CM-2," Proceedings of the 3rd Annual Symposium on Parallel Algorithms and Architectures, July 1991, pp. 3-16. [7]M.C. Becker et al., "The PowerPC 601 Microprocessor," IEEE Micro, Vol. 13, No. 4, October 1993, pp. 54-67. [8]A. Brandt, "Multi-Level Adaptive Solutions to Boundary-Value Problems," Mathematics of Computation, Vol. 31, No. 138, 1977, pp. 333-390. [9]J. Boyle, R. Butler, T. Disz, B. Blickfeld, E. Lusk, and R. Overbeek, Portable Programs for Parallel Processors. Holt, Rinehart and Winston, 1987. [10]B. Burgess et al., "The PowerPC 603 Microprocessor," Communications of the ACM, Vol. 37, No. 6, June 1994, pp. 34-42. [11]H. Burkhardt, et al., Overview of the KSR-1 Computer System. Tech. Report KSR-TR-9202001, Kendall Square Research. February 1992. [12]B. Catanzaro, Multiprocessor System Architectures: A Technical Survery of Multiprocessor/Multithreaded Systems Using SPARC, Multi-level Bus Architectures and Solaris (SunOS), Mountain View, C.A., Sun Microsystems, 1997. [13]M. Cekleov, et al., "SPARCcenter 2000: Multiprocessing for the 90''s!" Proceedings of the Compcon Spring 93, February 1993, pp. 345-353. [14]T.-F. Chen and J.-L. Baer: "A Performance Study of Software and Hardware Data Prefetching Schemes," Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994, pp. 223-232. [15]Convex, CONVEX Exemplar Architecture, Prentice-Hall, Englewood Cliffs, N.J., 1992. [16]Cray Research, Cray Superserver CS6400 Product, Brochure, 1993. [17]Cray Research, Cray T3D Technical Summary, October 1993. [18]Cray Research, Cray T3E Information, 1997. [19]K. Diefendorff and M. Allen, "Organization of the Motorola 88110 Superscalar RISC Microprocessor," IEEE Micro, Vol. 12, No. 2, April 1992, pp. 40-63. [20]M. Dubois and C. Scheurich, "Memory Access Dependencies in Shared-Mrmory Multiprocessors," IEEE Tranactions on Software Engineering, Vol. 16, No. 6, June 1990, pp. 660-673. [21]J.H. Edmonson et al., "Superscalar Instruction Execution in the 21164 Alpha Microprocessor," IEEE Micro, Vol. 15, No. 2, April 1995, pp. 33-43. [22]S. Fortune and J. Wyllie, "Parallelism in Random Access Machines," Proceedings of the Tenth ACM Symposium on Theory of Computing, May 1978, pp.114-118. [23]K. Gharachorloo et al., "Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors," Proceedings of the 17th Annual International Symposium on Computer Architecture, May 1990, pp. 15-26. [24]K. Gharachorloo and P. Gibbons, "Detecting Violations of Sequential Consistency," Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architecture, July 21-24, 1991, pp. 316-326. [25]K. Gharachorloo, A. Gupta, and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," Proceedings of the 1991 International Conference on Parallel Processing, 1991, Vol. I, pp. 355-364. [26]K. Gniady, B. Falsafi, T. Vijaykumar, "Is SC + ILP = RC?" Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999, pp. 162-171. [27]J.R. Goodman, Cache Consistency and Sequential Consistency, Technical Report No. 61, SCI Committee, March 1989. [28]J.L. Hennessy and N.P. Jouppi, "Computer Technology and Architecture: An Evolving Intercation," IEEE Computer, Vol. 24, No. 9, September 1991, pp. 18-29. [29]M.D. Hill, "Multiprocessors Should Support Simple Memory Consistency Models," IEEE Computer, Vol. 18, No. 8, August 1998, pp. 28-34. [30]T. Horel and G. Lauterbach, "UltraSPARC-III: Designing Third-Generation 64-bit Performance," IEEE Micro , Vol. 19, No. 3, May/June 1999, pp. 73-85. [31]D. Hunt, Advanced Features of the 64-Bit PA-8000, Palo Alto, C.A., Hewlett Packard Corp, 1996. [32]R.A. Iannucci, G.R. Gao, R.H. Halstead Jr., and B. Smith. Multithreaded Computer: A Summary of the State of the Art, Kluwer Academic Publishers, 1994. [33]Intel Corp., Material for Stand High Volume (SHV) servers can be found on Intel''s Web site: http://developer.intel.com/update/archive/issue5/feature.htm [34]M. Johnson, Superscalar Microprocessor Design, Prentice Hall, Inc., 1991. [35]D.R. Kaeli and P.G. Emma, "Branch History Table Prediction of Moving Target Branches due to Subroutine Returns," Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991, pp. 34-41. [36]R.E. Kessler, "The Alpha 21264 Microprocessor," IEEE Micro, Vol. 19, No. 2, March/April 1999, pp. 24-36. [37]D. Kroft. "Lockup-free instruction fetch/prefetch cache organization," Proceedings of the Eighth Annual International Symposium on Computer Architecture, 1981, pp. 81-87. [38]Kendall Square Research, KSR Technical Summary, 1993. [39]Kendall Square Research, KSR2 Product. Brochure, 1993. [40]A. Kumar, "The HP PA-8000 RISC CPU," IEEE Micro, Vol. 17, No. 2, March/April 1997, pp. 27-32. [41]L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, Vol. 28, No. 9, September 1979, pp. 690-691. [42]J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997, pp. 4-18. [43]J.K.F. Lee and A.J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, Vol. 17, No. 1, January 1984, pp. 6-22. [44]D. Levitan, T. Thomas, and P. Tu, "The PowerPC 620 Microprocessor: a High Performance Superscalar RISC Microprocessor," Compcon ''95. Technologies for the Information Superhighway, March 1995, pp. 285-291. [45]M.H. Lipasti, Value Locality and Speculative Execution, Ph.D. Thesis, Technical Report CMU-CSC-97-4, Department of Electrical and Computer Engineering, Carnegie Mellon University, May 1997. [46]T.D. Lovett, R.M. Clapp, and R.J. Safranek, NUMA-Q: An SCI-Based Enterprise Server, Sequent Computer Write Paper, 1996. [47]E. McLellan, "The Alpha AXP Architecture and 21064 Processor," IEEE Micro, Vol. 13, No. 3, June 1993, pp. 36-47. [48]T.C. Mowry and A. Gupta, "Tolerating latency through software-controlled prefetching in shared-memory multiprocessors," Journal of Parallel and Distributed Computing, Vol. 12, No. 2, June 1991, pp. 87-106. [49]K.B. Normoyle, M.A. Csoppenszky, A. Tzeng, T.P. Johnson, C.D. Furman, and J. Mostoufi, "UltraSPARC-IIi: Expanding the Boundaries of a System on a Chip," IEEE Micro, Vol. 18, No. 2, March/April 1998, pp. 14-24. [50]R.R. Oehler and R.D. Groves, "IBM RISC System/6000 Processor Architecture," IBM J. Res. Develop., Vol. 34, No. 1, January 1990, pp. 23-36. [51]K. Olukotun, L. Hammond, and M. Willey, "Improving the Performance of Speculatively Parallel Applications on the Hydra CMP," Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999, pp. 21-30. [52]S. Onder and R. Gupta, "Dynamic Memory Disambiguation in the Presence of Out-of-Order Store issuing," Proceedings of the 32nd Annual ACM/IEEE international symposium on Microarchitecture, November 16-18, 1999, Haifa Israel, pp. 170-176. [53]V.S. Pai, P. Ranganathan, S.V. Adve, and T. Harton, "An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors," Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996, pp. 12-23. [54]V.S. Pai, P. Ranganathan, H. Abdel-Shafi, and S.V. Adve, "The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors," IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 218-226. [55]M. Papamarcos and J. Patel, "A Low Overhead Cache Coherence Solution for Multiprocessors with Private Cache Memories," Proceedings of the 11th Annual International Symposium on Computer Architecture, 1984, pp. 348-354. [56]D.B. Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, Vol. 16, No. 2, April 1996, pp. 8-15. [57]P. Ranganathan, V.S. Pai, and S.V. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models," Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, June 1997, pp. 199-210. [58]J.P. Singh, W.-D. Weber, and A. Gupta, "SPLASH: Stanford Parallel Applications for Shared Memory," Computer Architecture News, Vol. 20, No. 1, March 1992, pp. 5-44. [59]S.P. Song, M. Denman, and J. Chang, "The PowerPC 604 RISC Microprocessor," IEEE Micro, Vol. 14, No. 5, October 1994, pp. 8-17. [60]J.G. Steffan and T.C. Mowry, "The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization," Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, February 2-4, 1998, Las Vegas, Nevada, pp. 2-13. [61]Sun, The SuperSPARC Microprocessor, Technical White Paper, Sun Microsystems, Mountain View, C.A., May 1992. [62]Sun, The Ultra Enterprise 10000 Server, Technical White Paper, Sun Microsystems, Mountain View, C.A., 1997. [63]M. Tremblay and J.M. O''Connor, "UltraSPARC I: A Four-Issue Processor Supporting Multimedia," IEEE Micro, Vol. 16, No. 2, April 1996, pp. 42-50. [64]J.E. Veenstra and R.J. Fowler, MINT Tutorial and User Manual, Technical Report 452, University of Rochester, June 1993. [65]S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta, "The SPLASH-2 Programs: Characterization and Methodological Considerations," in Proceeding of the 22nd Annual International Symposium on Computer Architecture, pp. 24-36, June 1995. [66]K.C. Yeager, "The Mips R10000 Superscalar Microprocessor," IEEE Micro, Vol. 16, No. 2, April 1996, pp. 28-41.
|