跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.144) 您好!臺灣時間:2026/04/26 14:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:傅勝余
研究生(外文):Fu, Sheng-Yu
論文名稱:在一個動態轉譯引擎中優化SIMD指令之生成
論文名稱(外文):Improvement of SIMD Code Generation in a Dynamic Binary Translator
指導教授:徐慰中
指導教授(外文):Hsu, Wei-Chung
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:英文
論文頁數:40
中文關鍵詞:模擬器
外文關鍵詞:QEMU
相關次數:
  • 被引用被引用:0
  • 點閱點閱:301
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Modern processors are increasingly enhanced with SIMD instructions. For examples, the MMX, SSE, and AVX instructions in the x86 architecture, and the Neon instruction set in the ARM architecture are all SIMD instructions. Using these SIMD instructions could significantly increase the performance of applications, hence application binaries are likely to have a greater fraction of instructions that are SIMD instructions. However, SIMD instruction translation has not attacked much attention in Dynamic Binary Translation (DBT). For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines do have SIMD instructions to support such parallel computation, leaving a large potential for performance enhancement.
In this thesis, we propose two approaches, one to leverage the existing helper function implementation in QEMU, and the other to use a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instructions translation in DBT of QEMU. The two approaches have been implemented in the QEMU with ARM frontend and x86-64 backend. In our experiment, the vector IR QEMU is 1.01 to 5.55 times faster than original QEMU with benchmark SPEC2006 CFP and 7.61 times faster than original QEMU with benchmark Linpack.

Modern processors are increasingly enhanced with SIMD instructions. For examples, the MMX, SSE, and AVX instructions in the x86 architecture, and the Neon instruction set in the ARM architecture are all SIMD instructions. Using these SIMD instructions could significantly increase the performance of applications, hence application binaries are likely to have a greater fraction of instructions that are SIMD instructions. However, SIMD instruction translation has not attacked much attention in Dynamic Binary Translation (DBT). For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines do have SIMD instructions to support such parallel computation, leaving a large potential for performance enhancement.
In this thesis, we propose two approaches, one to leverage the existing helper function implementation in QEMU, and the other to use a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instructions translation in DBT of QEMU. The two approaches have been implemented in the QEMU with ARM frontend and x86-64 backend. In our experiment, the vector IR QEMU is 1.01 to 5.55 times faster than original QEMU with benchmark SPEC2006 CFP and 7.61 times faster than original QEMU with benchmark Linpack.

Table of Contents
Abstract i
誌 謝 ii
Table of Contents iii
List of Figures v
Ⅰ. Introduction 1
Ⅱ. Background and Related Work 4
2.1 Binary Translator 4
2.1.1 Static Binary Translator 4
2.1.2 Dynamic Binary Translator 4
2.2 SIMD instructions 5
2.2.1 Intel’s SSE 6
2.2.2 ARM’s NEON 8
2.3 QEMU 9
Ⅲ. Design and Implementation 12
3.1 Observation and Objective 12
3.2 Design Issues 14
3.2.1 Approach 1: Modify the helper functions 14
3.2.2 Approach 2: Add vector IR to original TCG IR 15
3.2.3 Approach 1 VS Approach 2 16
3.3 Original QEMU Internal Overview 17
3.4. Implementation Detail for Approach 1 19
3.4.1 Register a Helper Function 19
3.4.2 Implementing helper function body 20
3.4.3 Ask Tiny Code Generator to Generate Helper Function Call 20
3.5 Vector IR Version QEMU Internal Overview 20
3.6 Implementation Detail for Approach 2 21
3.6.1 Register a New TCG IR 21
3.6.2 The Frontend: From NEON instruction to TCG Vector IR 22
3.6.3 The Backend: From Vector IR to Host Instruction 24
3.6.4 Translation Examples 25
3.6.5 Prologue and Epilogue 28
Ⅳ. Experimental Result 31
4.1 Environment 31
4.2 NEON Instruction Influence 31
4.3 Performance 32
4.3.1 SPEC 2006 33
4.3.2 Single Precision SPEC 2006 CFP and Linpack 34
Ⅴ. Conclusion and Future Work 38
Reference 39


Reference

[1] R.L. Sites, A. Chernoff, M. B. Kirk, M. P. Marks and S. G. Robinson, “Binary translation”, Communications of the ACM, Volume 36 Issue 2, Feb. 1993
[2] Anton Chernoff , Mark Herdeg , Ray Hookway , Chris Reeve , Norman Rubin , Tony Tye , S. Bharadwaj Yadavalli and John Yates,” FX!32 - A Profile-Directed Binary Translator”, IEEE Micro, 1998
[3] J-Y Chen, W Yang, T-H Hung, H-M Su, W C Hsu, “A static binary translator for efficient migration of ARM-based applications”, the 6th Workshop on Optimizations for DSP and Embedded Systems, 2008
[4] Nicholas Nethercote and Julian Seward, “Valgrind: a framework for heavyweight dynamic binary instrumentation”, ACM SIGPLAN Notices - Proceedings of the 2007 PLDI conference, 2007
[5] Vasanth Bala, Evelyn Duesterwald and Sanjeev Banerjia, “Dynamo: a transparent dynamic optimization system”, PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference, 2000
[6] Bob Cmelik, David Keppel, “Shade: A Fast Instruction-Set Simulator for Execution Profiling”, 94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, Pages 128-137, 1994
[7] C. Cifuentes, V. Malhotra, “Binary translation: static, dynamic, retargetable?”, Software Maintenance 1996, Proceedings., International Conference on, 1996
[8] Ding-Yong Hong, Jan-Jan Wu, Pen-Chung Yew, Wei-Chung Hsu, Chun-Chen Hsu, Pangfeng Liu, Chien-Min Wang, and Yeh-Ching Chung,” HQEMU: A Multi-Threaded and Retargetable Dynamic Binary Translator on Multicores”, Proceedings of the Tenth Annual IEEE/ACM International Symposium on Code Generation and Optimization, (CGO-2012), Apr. 2012.
[9] Bellard, Fabrice. "QEMU, a Fast and Portable Dynamic Translator." USENIX Annual Technical Conference, FREENIX Track. 2005.
[10] Flynn, Michael J., and Kevin W. Rudd. "Parallel architectures." ACM Computing Surveys (CSUR) 28.1 (1996): 67-70.
[11] “Intel® 64 and IA-32 Architectures Software Developer Manuals”, [Online]. Available: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals
[12] “”ARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition, [Online]. Available: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406b/index.html
[13] “ARM online document”, [Online]. Available: http://www.arm.com/products/processors/technologies/neon.php“Using Vector Instructions through Built-in Functions”[Online]. Available: http://GCC.gnu.org/onlinedocs/GCC/Vector-Extensions.html
[14] Lattner, Chris. "Introduction to the llvm compiler system." Proceedings of International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Erice, Sicily, Italy. 2008.
[15] “LLVM Language Reference Manual” [Online]. Available: http://llvm.org/docs/LangRef.html#vector-type

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top