|
[1] Advanced Micro Devices. (2013) AMD OpenCL™ Accelerated Parallel Processing SDK. [Online]. http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-tools-sdks/amd-accelerated-parallel-processing-app-sdk/ [2] A. Bakhoda, G.L. Yuan, W.W.L. Fung, H. Wong, and T.M. Aamodt, "Analyzing CUDA workloads using a detailed GPU simulator," Performance Analysis of Systems and Software, pp. 163-174, 2009. [3] Fabrice Bellard, "QEMU, a fast and portable dynamic translator," USENIX Annual Technical Conference, pp. 41-46, 2005. [4] Nathan Binkert et al., "The gem5 Simulator," ACM SIGARCH Computer Architecture News, pp. 1-7, May 2011. [5] David Brooks, Vivek Tiwari, and Margaret Martonosi, "Wattch: A Framework for Architectural-Level Power Analysis and Optimization," Computer Architecture, pp. 83-94, June 2000. [6] Shuai Che et al., "Rodinia: A benchmark suite for heterogeneous computing," IEEE International Symposium on Workload Characterization, pp. 44-54, 2009. [7] S. Collange, M. Daumas, D. Defour, and D. Parello, "Barra: A Parallel Functional Simulator for GPGPU," Modeling, Analysis &; Simulation of Computer and Telecommunication Systems, pp. 351-360, 2010. [8] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, and Yeh-Ching Chung, "PQEMU: A Parallel System Emulator Based on QEMU," IEEE International Conference on Parallel and Distributed Systems, pp. 276-283, 2011. [9] HSA foundation, "HSA Platform System Architecture Specification," pp. 1-62, Mar. 2014. [10] HSA foundation, "HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG)," pp. 1-418, June 2013. [11] HSA foundation, "HSA Runtime Programmer’s Reference Manual," pp. 1-117, June 2014. [12] Shih-Hao Hung, Tei-Wei Kuo, Chi-Sheng Shih, and Chia-Heng Tu, "System-wide profiling and optimization with virtual machines," Asia and South Pacific Design Automation Conference, pp. 395-400, 2012. [13] Khronos OpenCL Working Group, "The OpenCL Specification Version: 2.0," pp. 1-284, Mar. 2014. [14] Jingwen Leng et al., "GPUWattch: Enabling Energy Optimizations in GPGPUs," International Symposium on Computer Architecture, pp. 487-498, June 2013. [15] NVIDIA Corporation. (2014) Parallel Programming and Computing Platform | CUDA | NVIDIA. [Online]. http://www.nvidia.com/object/cuda_home_new.html [16] J. Power, J. Hestness, M. Orr, M. Hill, and D. Wood, "gem5-gpu: A Heterogeneous CPU-GPU Simulator," Computer Architecture Letters , p. 1, 2014. [17] John A. Stratton, Sam S. Stone, and Wen-mei W. Hwu, "MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs," Languages and Compilers for Parallel Computing, pp. 16-30, 2008. [18] R. Ubal, J. Sahuquillo, S. Petit, and P. Lopez, "Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors," Computer Architecture and High Performance Computing, pp. 62-68, 2007. [19] Z. Wang et al., "COREMU: a scalable and portable parallel full-system emulator," Principles and Practice of Parallel Programming, pp. 213–222, 2011.
|