(18.207.129.82) 您好!臺灣時間:2021/04/19 21:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王鈺晟
研究生(外文):Wang, Yu-Chen
論文名稱:Investigation of Polyhedral Transformations on CPU and GPU
論文名稱(外文):探討在CPU與GPU上之多面體轉換最佳化
指導教授:李政崑
指導教授(外文):Lee, Jenq-Kuen
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:35
中文關鍵詞:開放運算語言多面體模型
外文關鍵詞:Polyhedral ModelOpenCLRenderScript
相關次數:
  • 被引用被引用:0
  • 點閱點閱:95
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
多面體模型是一個對迴圈優化和平行化功能強大的數學架構,此模型已經發展了幾十年。由於豐富的數學理論背景,多面體模型變得受歡迎。目前有許多基於多面體模型所開發的架構,這些架構可以與編譯器技術結合以對迴圈程式碼做轉換以及分析。在這篇論文中,我們的目標是探討多面體轉換如何適用於不同的異質多核心平台。為了發現更多多面體轉換潛在的適用性,我們選擇對於多面體模型較少相關研究的Android RenderScript和OpenCL作為我們的目標。
RenderScript是Android作業系統的一個部分,它提供了針對異質計算的低層級API,我們藉由整合LLVM Polly到RenderScript online JIT compiler (libbcc)以達到經由多面體轉換的內核層級優化。在實驗中,我們重新用RenserScript改寫了PolyBench並且比較優化前和優化後的效能差異。
OpenCL是另一個可以在異質平台上編寫程式的架構。在這篇論文中,我們針對OpenCL的內核函數來進行多面體轉換的實驗研究。在實驗中,我們應用PolyBench/GP來評估優化後的性能。
通過實驗結果,多面體轉型超乎我們所預期的可以廣泛地應用。

The polyhedral model which is a powerful mathematical framework for loop nested optimization and parallelization has been developed for decades. It becomes popular because of the abundant mathematical theory background. There are many frameworks developed based on the polyhedral model, and these frameworks could be combined with compiler techniques for transformations of loop nested program codes and analysis. In this thesis, we aim to investigate how polyhedral transformations could be applied on different heterogeneous multi-core platforms. To discover more potential applicability of polyhedral transformations, we choose RenderScript on Android platform and OpenCL which have less research about polyhedral model as our target.
RenderScript is a component of Android operating system, it provides low-level APIs for heterogeneous computing. We perform kernel level optimization with polyhedral transformation by integrating LLVM Polly into RenderScript online JIT compiler (libbcc). In the experiment, we re-program PolyBench benchmark in RenderScript and compare the performance differences after the optimizations on Android 4.1.1 Jelly Bean, average we could speed up 58% in execution time.
OpenCL is another framework for writing programs which could be executed on heterogeneous platforms. In this thesis, we have an experimental research about performing polyhedral transformation on OpenCL kernel function. In the experiment, we apply PolyBench/GPU benchmark to evaluate the performance after the optimization. In loop tiling, we get 58% improvement in average. In loop interchange, we get over 2 times speed up.
Through the experimental result, polyhedral transformation is more widely applicable than we expect.

Abstract i
Contents iii
List of Figures v
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . 1
1.2 Polyhedral Framework Overview . . . . . . . . . . . . 2
1.2.1 Program Analysis Phase . . . . . . . . . . . . . . 2
1.2.2 Program Transformation Phase . . . . . . . . . . . 4
1.2.3 Code Generation Phase . . . . . . . . . . . . . . . 5
1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . 5
2 Kernel Level Optimization for RenderScript 7
2.1 RenderScript Overview . . . . . . . . . . . . . . . . 7
2.1.1 RenderScript Design Principles . . . . . . . . . . 8
2.1.2 RenderScript Compilation Flow . . . . . . . . . . . 8
2.2 LLVM Polly Introduction . . . . . . . . . . . . . . . 10
2.3 Kernel Level Optimization for RenderScript. . . . . . 11
2.4 Experiment . . . .. . . . . . . . . . . . . . . . . . 14
2.4.1 Experimental Setup . . . . . . . . . . . . . . . . 14
2.4.2 Experimental Result . . . . . . . . . . . . . . . . 15
3 Experimental Research of OpenCL Kernel Optimization 17
3.1 OpenCL Overview . . . . . . . . . . . . . . . . . . . 17
3.2 PoCC Introduction . . . . . . . . . . . . . . . . . . 20
3.3 Experimental Research of OpenCL Kernel Optimization . 21
3.3.1 Motivation . . . . . . . . . . . . . . . . . . . . 21
3.3.2 Optimization Process . . . . . . . . . . . . . . . 23
3.3.3 A Case Study . . . . . . . . . . . . . . . . . . . 26
3.4 Experiment . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Experimental Setup . . . . . . . . . . . . . . . . 28
3.4.2 Experimental Result . . . . . . . . . . . . . . . . 29
4 Conclusion 32
References 33
[1] M.-W. Benabderrahmane, L.-N. Pouchet, A.Cohen, and C. Bastoul, “The poly- hedral model is more widely applicable than you think,” in Compiler Construc-tion. Springer, 2010, pp. 283–303.

[2] C. Bastoul, “Improving data locality in static control programs,” Ph.D. disser-tation, University Paris 6, Pierre et Marie Curie, France, Dec. 2004.

[3] C. Chen, “Polyhedra scanning revisited,” in Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation. ACM, 2012, pp. 499–508.

[4] C. Bastoul, “Efficient code generation for automatic parallelization and optimiza- tion,” in ISPDC2 IEEE International Symposium on Parallel and Distributed Computing, 2003, pp. 23–30.

[5] S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, “Auto- tuning a high-level language targeted to gpu codes,” in Innovative Parallel Com-puting (InPar), 2012. IEEE, 2012, pp. 1–10.

[6] C. A. Lattner, “Llvm: An infrastructure for multi-stage optimization,” Ph.D. dissertation, University of Illinois, 2002.

[7] T. Grosser, H. Zheng, R. Aloor, A. Simburger, A. Groblinger, and L.-N. Pouchet, “Polly-polyhedral optimization in llvm,” in Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), vol. 2011, 2011.

[8] C. Lattner, “Llvm and clang: Next generation compiler technology,” in The BSD Conference, 2008, pp. 1–2.

[9] D. Khaldi, C. Ancourt, and F. Irigoin, “Towards automatic c programs optimiza- tion and parallelization using the pips-pocc integration,” PDF from http://www. rocq. inria. fr/˜ pouchet/software/pocc/doc/ht mldoc/htmldoc/index. html, 2011.

[10] G. Rudy, “Cuda-chill: A programming language interface for gpgpu optimiza- tions and code generation,” Ph.D. dissertation, The University of Utah, 2010.

[11] M. M. Baskaran, J. Ramanujam, and P. Sadayappan, “Automatic c-to-cuda code generation for affine programs,” in Compiler Construction. Springer, 2010, pp. 244–263.


[12] O. Kayiran, A. Jog, M. T. Kandemir, and C. R. Das, “Neither more nor less: Optimizing thread-level parallelism for gpgpus,” CSE Penn State Tech Report, TR-CSE-2012-006, 2012.

[13] J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee et al., “An opencl framework for heterogeneous multicores with local memory,” in Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010, pp. 193–204.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔