研究生(外文):Yu-ren Lai
論文名稱(外文):Design of the Superscalar Dual-Core Architecture using Single-Issue Out-of-Order Instruction Pipe for Embedded System
指導教授(外文):Jih-chin Chiu
外文關鍵詞:Dual-CoreSuperscalarEmbedded SystemOut-of-OrderSingle-Issue
1. 建構簡單的亂序執行核心。
2. 可動態排程的指令分析器設計。
3. 可進行跨核心運算元共享的機制。
4. 具雙核心間同步偵測功能的指令執行確認。
而在效能評估方面,利用程式完成此超純量雙核心架構之動作行為模型,以程式軌跡導向模擬的方式,對此架構進行功能驗證,並以MediaBench suite為效能評估程式進行效能之評估模擬,根據模擬的結果與單核心五階管線架構相比,顯示平均有1.4倍以上的效能增進。
With the improvement in VLSI technology, realization of multiple processor cores on a single chip becomes easier. Therefore, more and more users execute applications on current multi-core architectures. The multi-core system has a brilliant performance in executing multi-threaded applications, but this system could not gain any performance in single-threaded applications. This paper proposes a multi-core architecture for enhancing single-threaded performance in embedded system, and focuses on four points:
1. Construct a simple out-of-order execution core.
2. Design a dynamically scheduled instruction analyzer.
3. Design a mechanism for sharing operands between two cores.
4. Design a mechanism for committing instructions synchronously between two cores.
The architecture of each core is single-issue out-of-order instruction pipe. First, instruction analyzer will fetch instructions and generate instruction dependence tags by detecting the dependencies among the fetched instructions, then schedule instructions dynamically and dispatch to the cores. In the core, instructions can know where to get required operands according to the information of instruction tags, this mechanism enables data can be shared between two cores. Instructions are executed by data-driven approach, but in-order complete to maintain the correctness of the program order. Based on ARM instruction set, this paper tries to explore ways to achieve interaction control mechanisms between two cores and to accelerate a single-thread in the dual-core architecture.
We write a simulation model of the proposed architecture in C language as our trace-driven simulation framework and the MediaBench suite is selected for the experiments. According simulation result, the architecture can obtain average 40% performance speedup comparing to the five-stage pipelined architecture.
摘要 I
目錄 V
圖片列表 VII
表格列表 IX
第一章 簡介 1
1-1研究動機 1
1-2研究目標 2
1-3論文架構 3
第二章 相關研究 4
2-1單一核心架構介紹 4
2-2超純量處理器介紹 6
2-2-1靜態排程超純量處理器 7
2-2-2動態排程超純量處理器 8
2-2-3預先執行超純量處理器 10
2-3多核心架構介紹 11
第三章 單指令派發亂序執行之超純量雙核心架構設計 20
3-1單指令派發亂序執行之超純量雙核心架構 20
3-2指令分析器之設計 24
3-3單指令派發亂序執行之核心架構設計 30
3-3-1 Fetch Stage 34
3-3-2 Data Stage 36
3-3-3 Memory Stage 41
3-3-4 Commit Stage 44
第四章 模擬與分析 47
4-1架構驗證 47
4-1-1奇數和偶數和程式運作範例 47
4-1-2矩陣相乘程式運作範例 49
4-1-3範例程式在其他架構中運作的比較 51
4-2效能模擬 54
4-2-1模擬環境 55
4-2-2模擬器的實現 56
4-2-3效能評估程式 57
4-3模擬結果分析 59
第五章 結論 60
參考文獻 62
