|
[1] Jaekyu Lee and Hyesoon Kim. Tap: A tlp-aware cache management policy for a cpu-gpu heterogeneous architecture. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on, pages 1–12, Feb 2012. [2] Andre Rigland Brodtkorb, Trond Runar Hagen, and Martin Lilleeng Satra. Graphics processing unit (gpu) programming strategies and trends in gpu computing. J. Parallel Distrib. Comput., 73(1):4–13, 2013. [3] Intel. Intel microarchitecture code name sandy bridge. [4] AMD. Amd accelerated processing unit. [5] Nvidia. Nvidia tegra. [6] Alex Settle, Dan Connors, Enric Gibert, and Antonio Gonzalez. A dynamically reconfigurable cache for multithreaded processors. J. Embedded Comput., 2(2):221–233, April 2006. [7] Lisa R. Hsu, Steven K. Reinhardt, Ravishankar Iyer, and Srihari Makineni. Communist, utilitarian, and capitalist cache policies on cmps: Caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT ’06, pages 13–22, New York, NY, USA, 2006. ACM. [8] G.E. Suh, S Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In High-Performance Computer Architecture, 2002. Proceedings. Eighth International Symposium on, pages 117–128, Feb 2002. 34 [9] G.E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. The Journal of Supercomputing, 28(1):7–26, 2004. [10] Moinuddin K. Qureshi and Yale N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 423–432, Washington, DC, USA, 2006. IEEE Computer Society. [11] Miquel Moreto, FranciscoJ. Cazorla, Alex Ramirez, and Mateo Valero. Mlp-aware dynamic cache partitioning. In Per Stenstrom, Michel Dubois, Manolis Katevenis, Rajiv Gupta, and Theo Ungerer, editors, High Performance Embedded Architectures and Compilers, volume 4917 of Lecture Notes in Computer Science, pages 337–352. Springer Berlin Heidelberg, 2008. [12] Guang Suo, Xuejun Yang, Guanghui Liu, Junjie Wu, Kun Zeng, Baida Zhang, and Yisong Lin. Ipc-based cache partitioning: An ipc-oriented dynamic shared cache partitioning mechanism. In Convergence and Hybrid Information Technology, 2008. ICHIT ’08. International Conference on, pages 399–406, Aug 2008. [13] Chenjie Yu and P. Petrov. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms. In Design Automation Conference (DAC), 2010 47th ACM/IEEE, pages 132–137, June 2010. [14] Vineeth Mekkat, Anup Holey, Pen-Chung Yew, and Antonia Zhai. Managing shared last-level cache in a heterogeneous multicore processor. In Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, PACT ’13, pages 225–234, Piscataway, NJ, USA, 2013. IEEE Press. [15] Xing Lin and Rajeev Balasubramonian. Refining the utility metric for utility-based cache partitioning. In 9th Annual Workshop on Duplicating, Deconstructing, and Debunking, 2011. 35[16] M. Garrido and J. Grajal. Continuous-flow variable-length memoryless linear regression architecture. Electronics Letters, 49(24):1567–1569, November 2013. [17] Pablo Royer del Barrio, Sanchez Marcos, Miguel Angel, Marisa Lopez Vallejo, and Carlos Alberto Lopez Barrio. Area-efficient linear regression architecture for real-time signal processing on fpgas. 2011. [18] Elizabeth Holmes, Eric Ward, and Kellie Wills. MARSS: Multivariate Autoregressive State-Space Modeling, 2013. R package version 3.9. [19] Elizabeth E. Holmes, Eric J. Ward, and Kellie Wills. Marss: Multivariate autoregressive state-space models for analyzing time-series data. The R Journal, 4(1):30, 2012. [20] Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and Tor M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS, pages 163–174. IEEE, 2009. [21] Po-Han Wang, Chien-Wei Lo, Chia-Lin Yang, and Yu-Jung Cheng. A cycle-level simt-gpu simulation framework. In Rajeev Balasubramonian and Vijayalakshmi Srinivasan, editors, ISPASS, pages 114–115. IEEE, 2012. [22] Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters, 10(1):16–19, 2011. [23] J. Lotze, P.D. Sutton, and H. Lahlou. Many-core accelerated libor swaption portfolio pricing. In High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pages 1185–1192, Nov 2012. [24] Yuejian Xie and Gabriel H. Loh. Pipp: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, pages 174–183, New York, NY, USA, 2009. ACM. [25] Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely, Jr., and Joel Emer. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, pages 208–219, New York, NY, USA, 2008. ACM. [26] Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. High performance cache replacement using re-reference interval prediction (rrip). In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA’10, pages 60–71, New York, NY, USA, 2010. ACM.
|