跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.163) 您好!臺灣時間:2025/11/25 16:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉家倫
研究生(外文):Liu, Chia-Lun
論文名稱:針對多執行緒並共享Code Cache的動態二元碼轉換器
論文名稱(外文):Dynamic Binary Translation for Multi-Threaded Programs with Shared Code Cache
指導教授:楊武楊武引用關係
指導教授(外文):Yang, Wuu
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:58
中文關鍵詞:動態二元碼轉換器平行模擬動態優化多執行緒
外文關鍵詞:Dynamic binary translationParallel emulationDynamic optimizationThreads
相關次數:
  • 被引用被引用:0
  • 點閱點閱:300
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:0
我們在既有的一個二元碼轉換器平台(mc2llvm)上增加了可以模擬多執行緒程式的功能並且是以共享指令快取(Shared Code Cache) 為基礎。模擬多執行緒的程式有兩個重要的議題:(1)要清楚每個執行緒是如何在程式執行時產生或結束以及它的記憶體是如何配置的。(2)有效率的模擬,除了程式本身,模擬器也因為模擬多執行緒而會有同步的需求,這會讓模擬變得沒有效率。我們藉由(1) 減少lock/unlock pair 的區間(2)使用concurrent 資料結構(3)使用執行緒獨有的記憶體減少同步所帶來負擔。目前機器大部分是多核心的,我們利用機器空閒的核心產生 trace來達到加速。在我們的實驗裡,我們跑需要 8 個threads的程式可以比 QEMU快8.8X。
We present a process-level ARM-to-x86/64 dynamic binary translator which can efficiently emulate multi-threaded binaries based on mc2llvm. The difficulty of translating multi-threaded binaries is the synchronization overhead incurred by the translator, which has a great impact on the performance. We find the performance bottleneck of the synchronization and solve it by (1) shortening the lock section as we can (2) using the concurrent data structure (3) using the thread-private memory. In addition, we add trace compilation in mc2llvm to speed up the emulation. Code generation of traces is done by specific threads in our system. In our experiment, our system is faster than QEMU by 8.8X when emulating benchmarks with 8 guest threads.
Contents
摘要 ii
Abstract iii
誌謝 iv
List of Figures
viii
1 Introduction 1
2 Related Work 5
2.1 Binary Translation . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Shared Code Cache . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Trace Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Background
9
3.1 Program flow of mc2llvm . . . . . . . . . . . . . . . . . . . . .
3.2 Thread Creation and Termination . . . . . . . . . . . . . . . . 10
3.3 Manipulation of TLS Base . . . . . . . . . . . . . . . . . . . . 12
3.4
Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Design and Implementation
14
4.1 Memory Initialization . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 State Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Emulating Threads . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Instruction Translation . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Address Mapping Table and Shared Code Cache . . . . . . . . 22
4.6 System Call Handler . . . . . . . . . . . . . . . . . . . . . . . 23
4.7 Emulate multi-threaded programs . . . . . . . . . . . . . . . . 27
4.7.1 4.7.2 How to emulate the atomic operations? . . . . . . . . . 28
4.7.3 How to access TLS base address? . . . . . . . . . . . . 29
4.7.4
4.8
How to emulate thread creation and termination? . . . 27
What are shared in the translation system? . . . . . . 29
32-bit ARM on 64-bit x64 machine . . . . . . . . . . . . . . . 30
4.8.1 Memory Address Space . . . . . . . . . . . . . . . . . . 30
4.8.2 ABI Size of System Call Parameter . . . . . . . . . . . 31
5 Optimization
32
5.1 LLVM IR optimizations . . . . . . . . . . . . . . . . . . . . . 32
5.2 Active Chaining with Unconditional Direct Branches . . . . . 33
5.3 Trace Compilation . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3.1 Trace Selection . . . . . . . . . . . . . . . . . . . . . . 34
5.3.2 Trace Generation . . . . . . . . . . . . . . . . . . . . . 34
5.4
Filling empty thread-private table . . . . . . . . . . . . . . . . 38
6 Synchronization
40
6.1 Instruction Translator . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Access to Global Mapping Table . . . . . . . . . . . . . . . . . 41
6.3 Trace Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.4 Repeated Trace Detection . . . . . . . . . . . . . . . . . . . . 41
6.5 Translation of
6.6 LLVM Library . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
kuser cmpxchg . . . . . . . . . . . . . . . . . 42
7 Experiment Result
43
7.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Parallel Emulation . . . . . . . . . . . . . . . . . . . . . . . . 44
7.3 Time Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.4 Various Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.5 The Impact of Trace Compilation . . . . . . . . . . . . . . . . 48
7.6 Comparison with QEMU . . . . . . . . . . . . . . . . . . . . . 50
7.7 Comparison with Native Machine . . . . . . . . . . . . . . . . 51
7.8 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8 Conclusion 53
Bibliography 54

[1] Kim Hazelwood, Greg Lueck, Robert Cohn, Scalable Support for Mul-
tithreaded Applications on Dynamic Binary Instrumentation Systems,
ISMM’09 June 19-20, 2009, Dublin, Ireland.
[2] Zhaoguo Wang, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zhang,
Binyu Zang, COREMU: A Scalable and Portable Parallel Full-system
Emulator, in PPoPP’11, February 12-16, 2011, San Antonio, Texas, USA.
[3] Derek Bruening, Vladimir Kiriansky, Timothy Garnett, and Sanjeev
Banerji, Thread-Shared Software Code Caches, in CGO ’06 Proceedings
of the International Symposium on Code Generation and Optimization
Pages 28-38
[4] Maged M. Michael, High performance dynamic lock-free hash tables and
list-based sets , in SPAA ’02 Proceedings of the fourteenth annual ACM
symposium on Parallel algorithms and architectures Pages 73 - 82
[5] Jason Mars, Mary Lou Soffa, MATS: Multicore Adaptive Trace Selection,
in STMCS’08 Third Workshop on Software Tools for MultiCore Systems
[6] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, Yeh-Ching Chung,
PQEMU: A Parallel System Emulator Based on QEMU, in ICPADS ’11
Proceedings of the 2011 IEEE 17th International Conference on Parallel
and Distributed Systems Pages 276-283
[7] Ding-Yong Hong , Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu,
Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, Yeh-Ching Chung,
HQEMU: a multi-threaded and retargetable dynamic binary translator on
multicores, in CGO ’12 Proceedings of the Tenth International Sympo-
sium on Code Generation and Optimization Pages 104-113
[8] Swaroop Sridhar, Jonathan S. Shapiro, Eric Northup, Prashanth P. Bun-
gale, HDTrans: an open source, low-level dynamic instrumentation sys-
tem, in VEE ’06 Proceedings of the 2nd international conference on Vir-
tual execution environments Pages 175-185
[9] Hiroshige Hayashizaki, Peng Wu, Hiroshi Inoue, Mauricio J. Serrano,
Toshio Nakatani, Improving the performance of trace-based systems by
false loop filtering, in ASPLOS XVI: Proceedings of the sixteenth inter-
national conference on Architectural support for programming languages
and operating systems
[10] Jason D. Hiser, Daniel W. Williams, Wei Hu, Jack W. Davidson, Jason
Mars, Bruce R. Childers , Evaluating indirect branch handling mecha-
nisms in software dynamic translation systems, in Transactions on Ar-
chitecture and Code Optimization (TACO) , Volume 8 Issue 2
[11] A. Jeffery, Using the LLVM compiler infrastructure for optimised, asyn-
chronous dynamic translation in QEMU, Master’s thesis, University of
Adelaide, Australia, 2009.
[12] Chun-Chen Hsu, Pangfeng Liu, Chien-Min Wang, Jan-Jan Wu, Ding-
Yong Hong, Pen-Chung Yew, Wei-Chung Hsu, LnQ: Building High Per-
formance Dynamic Binary Translators with Existing Compiler Backends,
in ICPP ’11: Proceedings of the 2011 International Conference on Parallel
Processing
[13] Bor-Yeh Shen, Jyun-Yan You, Wuu Yang, Wei-Chung Hsu, An LLVM-
based Hybrid Binary Translation System, 7th IEEE International Sympo-
sium on Industrial Embedded Systems (SIES’12), Karlsruhe, Germany,
June 20-22, 2012
[14] Bor-Yeh Shen, Jiunn-Yeu Chen, Wei-Chung Hsu, Wuu Yang, LLBT: an
LLVM-based static binary translator, in CASES ’12 Proceedings of the
2012 international conference on Compilers, architectures and synthesis
for embedded systems Pages 51-60
[15] J. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systems
and Processes, Morgan Kaufmann, 2005
[16] Chris Lattner and Vikram Adve, LLVM: A Compilation Framework
for Lifelong Program Analysis &; Transformation, in Proceedings of the
2004 International Symposium on Code Generation and Optimization
(CGO’04), Palo Alto, California, Mar. 2004
[17] Emmett Witchel, Mendel Rosenblum, Embra: fast and flexible machine
simulation,, in Proceedings of the 1996 ACM SIGMETRICS international
conference on Measurement and modeling of computer systems
[18] Nicholas Nethercote, Julian Seward, Valgrind: A Framework for Heavy-
weight Dynamic Binary Instrumentation, PLDI’07 June 11-13, 2007, San
Diego, California, USA
[19] Mathias Payer, Thomas R. Gross, Fine-Grained User-Space Security
Through Virtualization, VEE’11, March 9-11, 2011, Newport Beach, Cal-
ifornia, USA
[20] Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia, Dynamo: a trans-
parent dynamic optimization system, in Proceedings of the ACM SIG-
PLAN 2000 conference on Programming language design and implemen-
tation
[21] QEMU, http://qemu.org/
[22] LLVM, http://llvm.org/
[23] Evelyn Duesterwald, Vasanth Bala, Software profiling for hot path pre-
diction: less is more, in Proceedings of the ninth international conference
on Architectural support for programming languages and operating sys-
tems
[24] David Hiniker, Kim Hazelwood, Michael D. Smith, Improving Region
Selection in Dynamic Optimization Systems, in Proceedings of the 38th
annual IEEE/ACM International Symposium on Microarchitecture
[25] Vishal Aslot, Max Domeika, Rudolf Eigenmann, Greg Gaertner, Wesley
B. Jones, Bodo Parady SPEComp: A New Benchmark Suite for Measur-
ing Parallel Computer Performance
[26] Maged M. Michael, Michael L. Scott, Simple, Fast, and Practical Non-
Blocking and Blocking Concurrent Queue Algorithms
[27] An ARM board - origen, http://www.origenboard.org

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top