|
[1]M. E. Wolf and M. S. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,” IEEE Trans. Parallel Distributed Systems, vol. 2 issue 4, pp. 452-471, Oct. 1991. [2]K. Kennedy, and K. S. McKinley, “Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution,” in Proceedings of the Int. Workshop on Languages and Compilers for Parallel Computing, pp. 301-320, Aug. 1993. [3]K. G. Kumar, D. Kulkarni, and A. Basu, “Deriving Good transformations for Mapping Nested Loops on Hierarchical Parallel Machines in Polynomial Time,” in Proceedings of the 6th international conference on Supercomputing, pp. 82-92, Jul. 1992. [4]A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, 2nd ed., Addison Wesley, Oct. 2007. [5]A. I. Holub, Compiler Design in C, Prentice Hall, Mar. 1990. [6]J. Jones, “Abstract Syntax Tree Implementation Idioms,” in Proceedings of the 10th Conference on Pattern Languages of Programs, Sep. 2003. [7]GNU Compiler Collection, URL: http://gcc.gnu.org/ [8]D. R. Wallace, “Low level scheduling using the hierarchical task graph,” in Proceedings of the 6th international conference on Supercomputing (ICS), pp. 72-81, Jul. 1992. [9]SPARK 3-Layered Intermediate representation, URL: http://mesl.ucsd.edu/spark/methodology/HTGs.shtml/ [10]M. Girkar and C. D. Polychronopoulos, “The hierarchical task graph as a universal intermediate representation,” International Journal of Parallel Programming, vol. 22, issue 5, Oct. 1994. [11]D. Gajski, N. Dutt, A. Wu, S. Lin, High-Level Synthesis: Introduction to Chip and System Design, Kluwer Academic Publishers, Feb. 1992. [12]J. Ferrante, K. J. Ottenstein, and J. D. Warren, “The program Dependence Graph and its Use in Optimization,” ACM Trans. Programming Languages and Systems, vol. 9, issue 3, pp. 319-349, Jul. 1987. [13]M. J. Harrold, B. Malloy, and G. Rothermel, “Efficient Construction of Program Dependence Graphs,” in Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), pp. 160-170, Jun. 1993. [14]D. Novillo, “GCC - An Architectural Overview, Current Status and Future Directions,” Ottawa Linux Symposium (OLS), Jul. 2006. [15]LANCE Retargetable C Compiler, URL: http://www.lancecompiler.com/ [16]R. Leupers, “LANCE: A C Compiler Platform for Embedded Processors,” in Embedded Systems/Embedded Intelligence, Feb. 2001. [17]K. Karuri, M. A. Al Faruque, S. Kraemer, R. Leupers, G. Ascheid, and H. Meyr, “Fine-grained Application Source Code Profiling for ASIP Design,” in Proceedings of the 42nd Design Automation Conference, pp. 329-334, Jun. 2005. [18]K. Karuri, C. Huben, R. Leupers, G. Ascheid, H. Meyr, “Memory Access Micro-Profiling for ASIP Design,” in Proceedings of the 3rd IEEE International Workshop on Electronic Design, Test and Applications, pp. 255-262, Jan. 2006. [19]Edison Design Group (EDG), URL: http://www.edg.com/ [20]Valgrind: URL: http://valgrind.org/ [21]J. Engblom, A. Ermedahl, M. Nolin, J. Gustafsson, and H. Hansson, “Worst-case execution-time analysis for embedded realtime systems,” International Journal on Software Tools for Technology Transfer, vol. 4, issue 4, pp. 437-455, Aug. 2003. [22]M. I. Gordon, W. Thies, and S. P, Amarasinghe, "Exploiting coarse-grained task, data, and pipeline parallelism in stream programs,” in Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS), pp. 151-162, Oct. 2006. [23]S. Rul, H. Vandierendonck, and K. D. Bosschere, “Extracting Coarse-Grain Parallelism in General-Purpose Programs,” in Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 281-282, Feb. 2008. [24]E. Ozer, S. Banerjia, and T. Conte, “Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures,” in Proceedings of the 31st Annual International Symposium on Microarchitecture, pp. 308-315, Dec. 1998. [25]K. Kailas, K. Ebcioglu, and A. Agrawala, “CARS: A new code generation framework for clustered ILP processors,” in Proceedings of the 7th International Symposium on High-Performance Computer Architecture, pp. 133-142, Feb. 2001. [26]M. Chu, K. Fan, and S. Mahlke, “Region-based hierarchical operation partitioning for multicluster processors,” in Proceedings of the SIGPLAN 2003 Conference on Programming Language Design and Implementation, pp. 300-311, Jun. 2003. [27]V. Paxson, Flex: The Fast Lexical Generator, URL: http://www.gnu.org/software/flex/ [28]C. Donnelly and R. Stallman, Bison: GNU parser generator. URL: http://www.gnu.org/software/bison/ [29]I. Sommerville, Software Engineering, 8th ed., Addison Wesley, Jun. 2006. [30]Gcov documentation. URL: http://gcc.gnu.org/onlinedocs/gcc/Gcov.html. [31]J. Fenlason and R. Stallman. The GNU Profiler URL: http://www.gnu.org/software/binutils/manual/gprof-2.9.1/gprof.html. [32]S. Horwitz, T. Reps, and D. Binkley, “Interprocedural Slicing Using Dependency Graphs,” ACM Trans. on Programming Languages and Systems, vol. 22 issue 1, pp. 26-60, Jan. 1990. [33]R. Leupers, O. Wahlen, M. Hohenauer, T. Kogel, and P. Marwedel, “An Executable Intermediate Representation for Retargetable Compilation and High-Level Code Optimization,” in Proceedings of the Int. Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS), pp. 120-125, Jul. 2003. [34]M. Chu, R. Ravindran, and S. Mahlke, “Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures,” in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 369-380, Dec. 2007.
|