|
[1] D. Abramson, J. Jackson, S. Muthrasanallur, G. Neiger, G. Regnier, R. Sankaran I. Schoinas, R. Uhlig, B. Vembu, and J. Wiegert. Intel virtualization technology for directed i/o. Intel technology journal, 10(3), 2006. [2] A. AMD. I/o virtualization technology spec., feb. 2007. [3] T.W. Barr, A. L. Cox, and S. Rixner. Translation caching: skip, don’t walk (the page table). In ACM SIGARCH Computer Architecture News, volume 38, pages 48–59. ACM, 2010. [4] A. Basu, M. D. Hill, and M. M. Swift. Reducing memory reference energy with opportunistic virtual caching. In ACM SIGARCH Computer Architecture News, volume 40, pages 297–308. IEEE Computer Society, 2012. [5] R. Bhargava, B. Serebrin, F. Spadini, and S. Manne. Accelerating two-dimensional page walks for virtualized systems. In ACM SIGARCH Computer Architecture News, volume 36, pages 26–35. ACM, 2008. [6] H. Bhatnagar. Advanced ASIC Chip Synthesis: Using SynopsysR Design CompilerTM Physical CompilerTM and PrimeTimeR . Springer Science & Business Media, 2007. [7] S. Chatterjee and S. Sen. Cache-efficient matrix transposition. In High-Performance Computer Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on, pages 195–205. IEEE, 2000. [8] Y.-k. Choi, J. Cong, Z. Fang, Y. Hao, G. Reinman, and P. Wei. A quantitative analysis on microarchitectures of modern cpu-fpga platforms. In DAC, 2016 53nd ACM/EDAC/IEEE, pages 1–6. IEEE, 2016. [9] J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, K. Gururaj, and G. Reinman Accelerator-rich architectures: Opportunities and progresses. In Proceedings of the 51st Annual Design Automation Conference, pages 1–6. ACM, 2014. [10] J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. Architecture support for accelerator-rich cmps. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 843–849. IEEE, 2012. [11] H. Esmaeilzadeh, E. Blem, R. St Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In ACM SIGARCH Computer Architecture News, volume 39, pages 365–376. ACM, 2011. [12] H. Foundation. Hsa platform system architecture spec. 1.0, 2015. [13] Y. Hao, Z. Fang, G. Reinman, and J. Cong. Supporting address translation for accelerator-centric architectures. In HPCA, 2017 IEEE International Symposium on, pages 37–48. IEEE, 2017. [14] B. Pichai, L. Hsu, and A. Bhattacharjee. Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces. In ACM SIGARCH Computer Architecture News, volume 42, pages 743–758. ACM, 2014. [15] B. Reagen, R. Adolf, Y. S. Shao, G.-Y.Wei, and D. Brooks. Machsuite: Benchmarks for accelerator design and customized architectures. In IISWC, 2014. IEEE, 2014. [16] Y. S. Shao and D. Brooks. Research infrastructures for hardware accelerators. Synthesis Lectures on Computer Architecture, 10(4):1–99, 2015. [17] Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A pre-rtl, power performance accelerator simulator enabling large design space exploration of customized architectures. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on, pages 97–108. IEEE, 2014. [18] Y. S. Shao, S. L. Xi, V. Srinivasan, G.-Y. Wei, and D. Brooks. Co-designing accelerators and soc interfaces using gem5-aladdin. In MICRO, 2016. IEEE, 2016. [19] S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. Cacti 5.1. Technical report, Technical Report HPL-2008-20, HP Labs, 2008.
|