跳到主要內容

臺灣博碩士論文加值系統

(34.204.169.230) 您好!臺灣時間:2024/02/26 11:14
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:龍泰文
研究生(外文):Lung, Tai-Wen
論文名稱:Data Locality Optimizations for Multi-Level Caches in Java Multi-Core Compiler
論文名稱(外文):在Java多核心編譯器中支援針對多層快取的資料區域性優化方法
指導教授:鍾葉青鍾葉青引用關係
指導教授(外文):Chung, Yeh-Ching
口試委員:金仲達徐慰中
口試日期:2011-7-28
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:38
中文關鍵詞:資料區塊優化、編譯器
相關次數:
  • 被引用被引用:0
  • 點閱點閱:291
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
科技的進步日新月異,傳統的單核心處理器(Single-Core Processor)已經無法滿足使用者的需求,為了獲得更快更好的效能,處理器正朝著多核心架構(Multi-Core Architecture)發展;多核心架構設計可讓一個處理晶片擁有更多的處理單元(Processor Unit)以及多階層的記憶體架構(例:階層一快取、階層二快取、階層三快取以及主記憶體)。而為了能善用多核心架構所帶來的優勢,程式設計者在設計程式的同時,必須要考量程式平性化以及資料區域性等議題。
在程式平行化方面,我們使用OpenMP為主要工具。OpenMP是一個廣為人知的程式開發介面,目前提供了C/C++/Fortran版本,程式設計者可透過OpenMP提供的語法以及函式開發平行程式。另外,在資料區域性議題方面,我們將介紹資料切割(Data Tiling or Loop Tiling)這項優化方法。資料切割可以增加資料留在快取中的時間,因此可以有效降低快取失誤的次數,進而提高程式的效能。
Java是一個非常多人使用的程式語言,不過目前市面上的Java編譯器並沒有支援OpenMP語法以及資料區域性優化的功能,為了能提供這樣的需求,我們設計了Java多核心編譯器的架構,此架構可以提供Java程式設計者一個OpenMP語法的環境,也支援了資料區域性的優化。此外,為了能讓資料區域性優化更有效率,我們必須決定要切割的資料區塊大小,我們在此提出了一個決定資料區塊大小的方法,
最後,我們用實驗來驗證。根據實驗的結果,透過Java多核心編譯器優化過後的程式,可以比原來Java的程式快上3.58倍。而且也得知,根據不同階層的快取,採用適合的資料區塊大小,確實可以更進一步的影響程式的效能。

The current tendencies of computer architecture are toward multi-core architectures which have memory hierarchies and large number of processing units on a single chip. For taking advantage of these architectures, programmers should consider the parallelism and data locality issues when they develop applications.
For the parallelism, OpenMP API is a friendly interface that helps programmers develop parallel program easily. For the data locality, loop-tiling is a crucial loop-transformation which can enhance data reuse in the cache. But, the current Java compiler supports neither OpenMP API nor loop-tiling optimization. To meet the requirement, we proposed a compiler framework – JMC, which can support OpenMP-like directive transformation and loop-tiling optimization for Java programming language. In addition, effective use of loop-tiling needs selection of the tile sizes. We designed a tile size selelction (TSS) scheme for different level cache size and gained good performance.
We evaluate our scheme on the 32-core platform. The experimental result shows either sequential program or parallel program can gain good performance enhancement via loop-tiling optimization. And the average speed up of sequential is 3.87; in the parallel program result, the average speed up is 3.58. Furthermore, according to different thread groups, using the TSS scheme to select fitted tile size for different level cache can enhance the performance.

Abstract i
摘要 ii
致謝 iii
Table of Contents iv
List of Figures vi
List of Tables vii
1. Introduction 1
1.1. Motivation 3
1.2. Purpose 4
1.3. Contribution 5
1.4. Structure of Thesis 5
2. Related Works 6
2.1. OpenMP 6
2.2. OpenMP APIs for Java 8
2.2.1 JOMP – an OpenMP-like Interface for Java 8
2.2.2 JaMP – OpenMP for a Java DSM 8
2.3. Loop Tiling Optimization 9
2.3.1 Previous works of Loop-Tiling Generator 10
3. Compiler and Optimization Design 11
3.1. JMC Overview 11
3.1.1 JMC Parallelism Model 12
3.1.2 JMC Framework 12
3.2. JMC Source-level Compiler 14
3.2.1 OpenMP-like Directive Translation Flowchart 16
3.2.2 Link 16
3.2.3 Expansion 17
3.2.4 Collect Variables 18
3.2.5 Check 19
3.2.6 Create Private Variables 19
3.2.7 Translation 19
3.2.8 Create Shared Context 19
3.3. JMC Runtime Library 20
3.3.1 Platform Independent 21
3.3.2 Execution Environment and Debugging 22
3.3.3 Parallel Team and Task 22
3.4. Loop Tiling Optimization 23
3.4.1 Loop-Tiling Generator 24
3.4.2 Multi-level Tiling 25
3.4.3 Tile Size Selection (TSS) 27
4. Experimental Evaluation 28
4.1. Experimental Environment 28
4.2. Performance Comparison 29
4.3. Different Tile Size Experiment 30
5. Conclusions and Future Works 35
6. References 36


[1] Pluto: A polyhedral automatic parallelizer and locality optimizer for multicores. Available at http://pluto-compiler.sourceforge.net.
[2] CLooG: The Chunky Loop Generator. http://www.cloog.org.
[3] C′edric Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE PACT, pages 7–16, September 2004.
[4] PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests. http://primetile.sourceforge.net.
[5] S. Coleman and K. McKinley. Tile Size Selection Using Cache Organization and Data Layout. In PLDI’95, pages 279–290, 1995.
[6] HiTLoG: Hierarchical Tiled Loop Generator. Available at http://www.cs.colostate.edu/MMAlpha/tiling/.
[7] D. Kim and S. Rajopadhye. Parameterized tiling for imperfectly nested loops. Technical Report CS-09-101, Colorado State University, Department of Computer Science, February 2009.
[8] TLoG: A Parametrized Tiled Loop Generator. Available at http://www.cs.colostate.edu/MMAlpha/tiling/.
[9] U. Bondhugula, J. Ramanujam, and P. Sadayappan. Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer. Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.
[10] Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000.
[11] J. M. Bull , M. E. Kambites, JOMP—an OpenMP-like interface for Java, Proceedings of the ACM 2000 conference on Java Grande, p.44-53, San Francisco, California, United States , June 03-04, 2000.
[12] Michael Klemm, Matthias Bezold, Ronald Veldema, and Michael Philippsen. Jamp: An implementation of openmp for a java dsm. Concurrency and Computation: Practice and Experience, 18(19):2333{2352, 2007.
[13] JavaCC : A Java Compiler Compiler. http://javacc.java.net/
[14] DaeGon Kim , Lakshminarayanan Renganarayanan , Dave Rostron , Sanjay Rajopadhye , Michelle Mills Strout, Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, Nevada , November 10-16, 2007.
[15] M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan. Parameterized tiling revisited. In The International Symposium on Code Generation and Optimization (CGO), 2010.
[16] M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. Submitted for publication., 1990.
[17] Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, , Santa Clara, California, United States, April 08-11, 1991.
[18] Stephanie Coleman , Kathryn S. McKinley, Tile size selection using cache organization and data layout, Proceedings of the ACM SIGPLAN 1995. conference on Programming language design and implementation, p.279-290, La Jolla, California, United States June 18-21, 1995.
[19] Gabriel Rivera , Chau-Wen Tseng, Locality optimizations for multi-level caches, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.2-es, Portland, Oregon, United States, November 14-19, 1999.
[20] T. Yuki, L. Renganarayanan, Sanjay Rajopadhye, Charles Anderson, Alexandre E. Eichenberger, Kevin O'Brien, Automatic creation of tile size selection models, Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization , NY, USA, 2010.
[21] OpenMP : http://openmp.org/wp/
[22] MPI : Message Passing Interface. http://www.mcs.anl.gov/research/projects/mpi/
[23] Pthread : http://computing.llnl.gov/tutorials/pthreads/
[24] Wei-I Lu, Improving Multi-core Cache Utilization with Data Blocking and Thread Grouping, June, 2010
[25] Wei-Nung Su, JMC : Java Language Compiler and Runtime Library for Multi-Core Platform Based-On OpenMP 3.0 Programming Model, June, 2010

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top