跳到主要內容

臺灣博碩士論文加值系統

(44.221.66.130) 您好!臺灣時間:2024/06/20 23:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:高崇閔
研究生(外文):Kao, Chung-Min
論文名稱:基於LLVM技術開發之異質核心模擬器中GPU編譯器 : HTranslator
論文名稱(外文):The LLVM based GPU Compiler in Heterogeneous System Architecture Emulator: HTranslator
指導教授:鍾葉青鍾葉青引用關係
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:31
中文關鍵詞:異質架構系統SIMDGPU模擬器編譯器
外文關鍵詞:heterogeneous system architectureSIMDGPUemulatorcompiler
相關次數:
  • 被引用被引用:0
  • 點閱點閱:379
  • 評分評分:
  • 下載下載:29
  • 收藏至我的研究室書目清單書目收藏:0
異質系統架構(HSA)是由HSA基金會制定之工業標準,許多重要的應用處理器廠商皆為此基金會的成員,如:超微、安謀、聯發科技、三星以及高通,本論文將基於根據此標準開發之模擬器,闡述模擬器中GPU部分之編譯器設計及實作,並且產生Single Instruction Multiple Data(SIMD)指令進行優化。
模擬異質系統架構GPU執行的過程中,CPU相較於實體GPU在執行緒數目上顯得相當缺乏,倘若每次GPU的執行都交由一個CPU的執行緒執行,每個執行緒都將被分配到多個原先實體GPU的工作並依序執行之,在大部分的情況下,GPU皆是在不同的資料上執行相同的指令,在這種情況下加入SIMD指令,便可藉由硬體的幫助在一個SIMD指令內同時處理數筆資料,讓一個執行緒完成原先需要數個執行緒才能完成的工作,進而提升模擬器執行效率並更貼近GPU實際運作。
在條件跳躍指令存在的情況下,不同的GPU其跳躍目的位址可能不同,進而無法直接使用SIMD的指令進行模擬,因此,編譯器產生機器碼之前須重新建構程式執行流程,確保任一目的位址所指向區塊中所有指令都將被執行,同時為了確保執行結果的正確性,使用bitmap紀錄各GPU條件跳躍的結果,條件跳躍發生的同時,會將各GPU是否跳躍寫入bitmap中,對於那些GPU不該執行此目的位址指令的部分,則利用此bitmap遮蔽其執行結果。

Heterogeneous System Architecture (HSA) is an open industry standard formulated by HSA foundation. Many Application processor vendors, such as AMD, ARM, Me-dia Tek, Samsung, qualcomm are member of it. This thesis will focus on emulator base on this standard, and descript GPU compiler design. In additional, add Single Instruction Multiple Data (SIMD) instruction to speed up emulator’s execution.
In the procedure of simulation GPU’s execution with the heterogeneous system ar-chitecture, the number of threads in CPU is far less than in physical GPU. If emulator assigns each physical GPU’s task to a CPU thread, each thread will receive more than one task and iterate complete them. In most situations, physical GPUs are executing same instructions to deal with different data. In these cases, it can add SIMD instruc-tion to speed up the execution. With the help of hardware, emulator can handle dif-ferent data at the same time in the clocks of a SIMD instruction and make a thread completes tasks assigned few threads before. Then improve emulator’s performance and this way is much closer physical GPU’s execution.
When conditional branch instructions exist, different GPU may jump to different target address and can’t be simulated by SIMD instructions straightly. To resolve this case, before compiler generates target code, it should reconstruct the control flow of program to make sure each instruction in blocks pointed by target address will be ex-ecuted. To avoid adding SIMD instruction in emulator and reconstructing control flow can still get correct result, it’s necessary to use a bitmap to record conditional jump’s result of GPUs. When compiler finds conditional jump instructions, it writes result of GPUs into bitmap. For these GPUs should not execution instructions in the block, emulator using bitmap to mask the result.

Chapter 1 Introduction 6
Chapter 2 Background 8
2.1 Heterogeneous System Architecture 8
2.2 Heterogeneous System Architecture Intermediate Language 8
Chapter 3 Related Work 9
3.1 HSA Emulator: HSAemu 9
3.2 Tiny Code Generator 9
3.3 Multi2Sim 9
3.4 Whole-Function Vectorization 10
3.5 LLVM infrastructure 10
Chapter 4 Translator in GPU Simulation 11
4.1 GPU simulation in HSA Emulator 11
4.2 GPU Just-In-Time Translator 12
4.2.1 Just-In-Time Translator 12
4.2.2 Linker and Loader 13
4.2.3 Special Instruction Simulation 13
Memory Relative Instruction 14
Mathematical Instruction 14
Kernel Information Instruction 14
Synchronization Instruction 14
CHAPTER 5 SIMD Instruction in GPU Simulation 16
5.1 Single Instruction Multiple Data 17
5.2 The Control Flow Graph Reconstruction 17
5.3 How to Do Bitmap Masking? 21
CHAPTER 6 Experiment Results and Discussion 25
6.1 Benchmarks 25
6.2 Experimental results 26
CHAPTER 7 Conclusion and Future Work 29
REFERENCE 30

[1] HSA_PRM_Proposed_Version_1.2_27_August_2012
[2] HSA_Software_System_Architecture_Specification_Version_1.1_27_July_2012
[3] HSA_Hardware_System_Architecture_Specification_Version_1.1_27_July_2012
[4] OpenCL http://www.khronos.org/opencl/
[5] Zhou-Dong Guo; Yeh-Ching Chung, HSA emulator design based on QEMU
[6] Rafael Ubal; Byunghyun Jang; Perhaad Mistry; Dana Schaa; David Kaeli, Multi2Sim: A Simulation Framework for CPU-GPU Computing, Computer Ar-chitecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th In-ternational Symposium on
[7] Karrenberg, R.; Hack, S. Whole-Function Vectorization Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on
[8] R. Karrenberg; S. Hank, Improving Performance of OpenCL on CPUs, Compiler Construction, 2012
[9] Chris Lattner and Vikram Adve, LLVM: "A Compilation Framework for Lifelong Program Analysis & Transformation", Proceedings of the 2004 Interna-tional Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar. 2004.
[10] Zhaoguo Wang; Ran Liu; Yufei Chen; Xi Wu; Haibo Chen; Weihua Zhang; Binyu Zang, COREMU: A Scalable and Portable Parallel Full-system Emulator, proceedings of the 16 th ACM symposium on Principles and practice of parallel programming.
[11] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, Yeh-Ching Chung, "PQEMU: A Parallel System Emulator Based on QEMU," icpads, pp.276-283, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011
[12] The LLVM Compiler Infrastructure, http://llvm.org/
[13] LLVM Language Reference Manual, http://llvm.org/docs/LangRef.html

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top