|
[1] OpenCL – The open standard for parallel programming of heterogeneous systems, [Online], Available: http://www.khronos.org/object/opencl/ . [2] V. Narasiman; M. Shebanow; C. J. Lee; R. Miftakhutdinov; O. Mutlu, and Y. N. Patt, “Improving GPU Performance via Large Warps and Two-level Warp Scheduling, MICRO-44 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture,Pages 308-317,ACM New York, NY, USA ©2011. [3] S. Collange, “Stack-less SIMT Reconvergence at Low Cost, ARENAIRE - Inria Grenoble Rhône-Alpes / LIP Laboratoire de l’Informatique du Parallélisme, 2011. [4] M. Rhu and M. Erez, The dual-path execution model for efficient GPU control flow, High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on , vol., no., pp.591,602, 23-27 Feb. 2013 [5] HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide. and Object Format(BRIG), 2014. [6] W.W.L. Fung; I. Sham; G.Yuan; and T.M. Aamodt, Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow, Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on , vol., no., pp.407,420, 1-5 Dec. 2007. [7] Intel HD Graphics OpenSource PRM, 2010. [8] A. ElTantawy; J.W. Ma; M. O'Connor and T.M. Aamodt, A scalable multi-path microarchitecture for efficient GPU control flow, High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on , vol., no., pp.248,259, 15-19 Feb. 2014 [9] F. Zhang and E. H. D’Hollander, “Using hammock graphs to structure programs, Software Engineering, IEEE Transactions on , vol.30, no.4, pp.231,245, April 2004. [10] R. A. Lorie and H. R. Strong, US Patent 4,435,758: Method for conditional branch execution in SIMD vector processors, 1984. [11] J. Meng; D. Tarjan and K. Skadron, “Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance, In Proc. 37th Int’l Symp. on Computer Architecture (ISCA), pages 235– 246, 2010. [12] J.D.Collins; D.M. Tullsen and P. Wang, Control Flow Optimization Via Dynamic Reconvergence Prediction,,MICRO 37 Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, Pages 129-140, 2004.. [13] AMD SDK: AMD APP Software Development Kit, [Online], Available : http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/ . [14] S. Che et al., Rodinia: A benchmark suite for heterogeneous computing, IISWC ( IEEE International Symposium on Workload Characterization ) , vol., no., pp.44,54, 4-6 Oct. 2009. [15] A. Kerr, G. Diamos and S. Yalamanchili, A characterization and analysis of PTX kernels, IISWC ( IEEE International Symposium on Workload Characterization ) , , vol., no., pp.3,12, 4-6 Oct. 2009 [16] Rogers, T.G., O'Connor, M., Aamodt, T.M., Cache-Conscious Wavefront Scheduling,, MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM, International Symposium on Microarchitecture, Pages 72-83, 2012.
|