( 您好!臺灣時間:2024/02/26 11:14
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Lung, Tai-Wen
論文名稱:Data Locality Optimizations for Multi-Level Caches in Java Multi-Core Compiler
指導教授(外文):Chung, Yeh-Ching
  • 被引用被引用:0
  • 點閱點閱:291
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
科技的進步日新月異,傳統的單核心處理器(Single-Core Processor)已經無法滿足使用者的需求,為了獲得更快更好的效能,處理器正朝著多核心架構(Multi-Core Architecture)發展;多核心架構設計可讓一個處理晶片擁有更多的處理單元(Processor Unit)以及多階層的記憶體架構(例:階層一快取、階層二快取、階層三快取以及主記憶體)。而為了能善用多核心架構所帶來的優勢,程式設計者在設計程式的同時,必須要考量程式平性化以及資料區域性等議題。
在程式平行化方面,我們使用OpenMP為主要工具。OpenMP是一個廣為人知的程式開發介面,目前提供了C/C++/Fortran版本,程式設計者可透過OpenMP提供的語法以及函式開發平行程式。另外,在資料區域性議題方面,我們將介紹資料切割(Data Tiling or Loop Tiling)這項優化方法。資料切割可以增加資料留在快取中的時間,因此可以有效降低快取失誤的次數,進而提高程式的效能。

The current tendencies of computer architecture are toward multi-core architectures which have memory hierarchies and large number of processing units on a single chip. For taking advantage of these architectures, programmers should consider the parallelism and data locality issues when they develop applications.
For the parallelism, OpenMP API is a friendly interface that helps programmers develop parallel program easily. For the data locality, loop-tiling is a crucial loop-transformation which can enhance data reuse in the cache. But, the current Java compiler supports neither OpenMP API nor loop-tiling optimization. To meet the requirement, we proposed a compiler framework – JMC, which can support OpenMP-like directive transformation and loop-tiling optimization for Java programming language. In addition, effective use of loop-tiling needs selection of the tile sizes. We designed a tile size selelction (TSS) scheme for different level cache size and gained good performance.
We evaluate our scheme on the 32-core platform. The experimental result shows either sequential program or parallel program can gain good performance enhancement via loop-tiling optimization. And the average speed up of sequential is 3.87; in the parallel program result, the average speed up is 3.58. Furthermore, according to different thread groups, using the TSS scheme to select fitted tile size for different level cache can enhance the performance.

Abstract i
摘要 ii
致謝 iii
Table of Contents iv
List of Figures vi
List of Tables vii
1. Introduction 1
1.1. Motivation 3
1.2. Purpose 4
1.3. Contribution 5
1.4. Structure of Thesis 5
2. Related Works 6
2.1. OpenMP 6
2.2. OpenMP APIs for Java 8
2.2.1 JOMP – an OpenMP-like Interface for Java 8
2.2.2 JaMP – OpenMP for a Java DSM 8
2.3. Loop Tiling Optimization 9
2.3.1 Previous works of Loop-Tiling Generator 10
3. Compiler and Optimization Design 11
3.1. JMC Overview 11
3.1.1 JMC Parallelism Model 12
3.1.2 JMC Framework 12
3.2. JMC Source-level Compiler 14
3.2.1 OpenMP-like Directive Translation Flowchart 16
3.2.2 Link 16
3.2.3 Expansion 17
3.2.4 Collect Variables 18
3.2.5 Check 19
3.2.6 Create Private Variables 19
3.2.7 Translation 19
3.2.8 Create Shared Context 19
3.3. JMC Runtime Library 20
3.3.1 Platform Independent 21
3.3.2 Execution Environment and Debugging 22
3.3.3 Parallel Team and Task 22
3.4. Loop Tiling Optimization 23
3.4.1 Loop-Tiling Generator 24
3.4.2 Multi-level Tiling 25
3.4.3 Tile Size Selection (TSS) 27
4. Experimental Evaluation 28
4.1. Experimental Environment 28
4.2. Performance Comparison 29
4.3. Different Tile Size Experiment 30
5. Conclusions and Future Works 35
6. References 36

[1] Pluto: A polyhedral automatic parallelizer and locality optimizer for multicores. Available at http://pluto-compiler.sourceforge.net.
[2] CLooG: The Chunky Loop Generator. http://www.cloog.org.
[3] C′edric Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE PACT, pages 7–16, September 2004.
[4] PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests. http://primetile.sourceforge.net.
[5] S. Coleman and K. McKinley. Tile Size Selection Using Cache Organization and Data Layout. In PLDI’95, pages 279–290, 1995.
[6] HiTLoG: Hierarchical Tiled Loop Generator. Available at http://www.cs.colostate.edu/MMAlpha/tiling/.
[7] D. Kim and S. Rajopadhye. Parameterized tiling for imperfectly nested loops. Technical Report CS-09-101, Colorado State University, Department of Computer Science, February 2009.
[8] TLoG: A Parametrized Tiled Loop Generator. Available at http://www.cs.colostate.edu/MMAlpha/tiling/.
[9] U. Bondhugula, J. Ramanujam, and P. Sadayappan. Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer. Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.
[10] Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.
[11] J. M. Bull , M. E. Kambites, JOMP—an OpenMP-like interface for Java, Proceedings of the ACM 2000 conference on Java Grande, p.44-53, San Francisco, California, United States , June 03-04, 2000.
[12] Michael Klemm, Matthias Bezold, Ronald Veldema, and Michael Philippsen. Jamp: An implementation of openmp for a java dsm. Concurrency and Computation: Practice and Experience, 18(19):2333{2352, 2007.
[13] JavaCC : A Java Compiler Compiler. http://javacc.java.net/
[14] DaeGon Kim , Lakshminarayanan Renganarayanan , Dave Rostron , Sanjay Rajopadhye , Michelle Mills Strout, Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, Nevada , November 10-16, 2007.
[15] M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan. Parameterized tiling revisited. In The International Symposium on Code Generation and Optimization (CGO), 2010.
[16] M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. Submitted for publication., 1990.
[17] Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, , Santa Clara, California, United States, April 08-11, 1991.
[18] Stephanie Coleman , Kathryn S. McKinley, Tile size selection using cache organization and data layout, Proceedings of the ACM SIGPLAN 1995. conference on Programming language design and implementation, p.279-290, La Jolla, California, United States June 18-21, 1995.
[19] Gabriel Rivera , Chau-Wen Tseng, Locality optimizations for multi-level caches, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.2-es, Portland, Oregon, United States, November 14-19, 1999.
[20] T. Yuki, L. Renganarayanan, Sanjay Rajopadhye, Charles Anderson, Alexandre E. Eichenberger, Kevin O'Brien, Automatic creation of tile size selection models, Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization , NY, USA, 2010.
[21] OpenMP : http://openmp.org/wp/
[22] MPI : Message Passing Interface. http://www.mcs.anl.gov/research/projects/mpi/
[23] Pthread : http://computing.llnl.gov/tutorials/pthreads/
[24] Wei-I Lu, Improving Multi-core Cache Utilization with Data Blocking and Thread Grouping, June, 2010
[25] Wei-Nung Su, JMC : Java Language Compiler and Runtime Library for Multi-Core Platform Based-On OpenMP 3.0 Programming Model, June, 2010

註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top