(54.236.58.220) 您好!臺灣時間:2021/03/04 23:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:吳永崇
研究生(外文):Yung-Chung Wu
論文名稱:高性能處理機多區塊存取引擎之設計
論文名稱(外文):A Multiple Blocks Fetch Engine for High Performance Superscalar Processor
指導教授:謝忠健謝忠健引用關係
指導教授(外文):Jong-Jiann Shieh
學位類別:碩士
校院名稱:大同大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2001
畢業學年度:89
語文別:英文
論文頁數:70
中文關鍵詞:分支預測多重分支預測分支目的緩衝器加強功能的分支目的緩衝器分支佇列
外文關鍵詞:Branch PredictionMultiple Branch PredictionBranch Target BufferEnhanced Branch Target BufferOutstanding Branch Queue
相關次數:
  • 被引用被引用:0
  • 點閱點閱:63
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
中文摘要
現今高效能的電腦實現都是傾向於在硬體上增加平行度,而提取指令的頻寬將變成效能的主要瓶頸。目前大多數的提取單元被限制於每一個時間週期只能預測一個分支指令,因此,每一個時間週期最多也只能提取一個基本的區塊。雖然每個時間週期提取一個基本區塊是足夠實現每一個時間週期可以發出至少四個指令,但是對於可發出更多指令數目的處理器是不敷需求的。如果我們能採用多區塊預測,提取單元就提取多個連續基本區塊,就能滿足處理器高發出指令率。
我們考慮兩個主要的部分以改善提取指令單元的能力,而能更進一步在一個時間週期內提取至少一個以上的基本區塊。這兩個主要的部分,分別為 (1)在每一個時間週期預測多個分支指令的分支行為路徑,(2)在一個時間週期內就能產生可以提取非連續區塊的位址。 我們整合了一個方法來實現這兩個主要的部分,(1)使用一個高準確率的分支預測器,而這個分支預測器是具有可以在一個時間週期內就可以預測出三個分支的能力(2)設計一個具有更強功能的分支目的記憶體(BTB),這個新的分支目的記憶體儲存了可提供在指令流方向中的基本區塊的位址。
我們針對我們所設計加強功能的分支目的記憶體(EBTB)的大小來作效能的分析,並探討多重分支預測所造成的問題,而進一步尋求解決方法。最後並探討具有多重分支提取功能的加強功能的分支目的記憶體的命中率與與整體效能之間的關聯性。利用這個機制,FIPC可以從3.3提升到4.8,IPC可以2.3提升到2.7。EBTB的命中率可以達到0.98。

Abstract
The implementation of modern high performance computer is increasingly directed toward parallelism in the hardware. However, the instruction fetch bandwidth becomes the performance bottleneck. Most of the current fetch units are limited one branch prediction per cycle and therefore, can fetch no more than one basic block per cycle. While fetching a single basic block each cycle is sufficient for implementations that issue at most four instructions per cycle, it is not for processors with higher peak issue rates. If multiple block prediction is used, the fetch unit can at least fetch multiple contiguous basic blocks.
There are two essential components to provide the ability to fetch more than one basic block each cycle - predicting multiple branches each cycle, and generating fetch address for possibly nonconsecutive basic block each cycle. We, providing an integrated solution for these problems, will use a highly accurate branch predictor capable of making predictions for multiple branches in a single cycle, and an enhanced branch target buffer (EBTB) to provide the address of the basic blocks to which the branches direct the instruction flow.
We analysis the influence of size of enhanced branch target buffer to performance and probe into the problem caused by multiple branch prediction; and further, explore the solution of this problem. Finally, we discuss the relationship between the hit rate of enhanced branch target buffer and performance. By utilizing this mechanism, the FIPC will go up from 3.3 to 4.8, and the IPC is enhanced from 2.3 to 2.7. The hit rate of EBTB can reach 0.98.

Chapter 1 Introduction
Chapter 2 Related work
2.1THE MULTIPLE BRANCH TWO-LEVEL ADAPTIVE BRANCH PREDICTOR
2.2OUTSTANDING BRANCH QUEUE
Chapter 3 Fetching Multiple Basic Blocks Each Cycle
3.1 THE HARDWARE FOR MULTIPLE BRANCH TWO-LEVEL ADAPTIVE BRANCH PREDICTOR
3.2 OUTSTANDING BRANCH QUEUE
3.3 ENHANCED BRANCH TARGET BUFFER
Chapter 4 Algorithms for the Enhanced Fetch Engine
4.1 THE MULTIPLE BRANCH PREDICTOR ALGORITHM
4.2 THE OUTSTANDING BRANCH QUEUE ALGORITHM
4.3 THE ENHANCED BRANCH TARGET BUFFER ALGORITHM
Chapter 5 Simulation Environment
5.1 SIMULATOR OVERVIEW
5.1.1 Out-of-order processor timing simulation
5.1.2 Fetch stage
5.1.3 Dispatch stage
5.14 Issue stage
5.1.5 Writeback stage
5.1.6 Commit stage
5.2 THE BENCHMARK AND INPUT SET
Chapter 6 Performance Evaluation
6.1 SIMULATED CONFIGURATION
6.2 THE STATIC BRANCH ANALYSIS
6.3 THE MULTIPLE BRANCH PREDICTION ACCURATE RATE WITH OBQ
6.5 PERFORMANCE OF ENHANCED BRANCH TARGET BUFFER
Chapter 7 Conclusion and Future Work
7.1 CONCLUSION
References
Appendix A Configuration File
A.1 BASE GAG MODEL
A.2 The EBTB MOEDL

References
[1]R. Colwell, R. Nix, J. O,Donnell, D. Papworth, and P. Rodman, "A VLIW Architecture for a Trace Scheduling Compiler," proc of the 2nd Intl Conf on Architectural Support for Programming Languages and Operating Systems, (Oct. 1987), pp.180-192.
[2]W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellete, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Labery, "The superblock: An effective technique for VLIW and superscalar compilation," The Journal of Supercomputing, January 1993.
[3]B.R. Rau, D. Yen, W. Yen, and R. Towle, "The Cydra 5 Departmental Supercomputer - Design Philosophies, Decisions, and Trade-offs," IEEE Computer, (Jan. 1989), pp. 12-35.
[4]T-Y Yeh and Y. N. Patt "A Comparison of Dynamic Branch Predictors that use Two Levels of Branch History," Proceedings of the 20th International Symposium on Computer Architecture, (May 1993).
[5]S. Jourdan, J. Stark, T.-H. Hsing, and Y. N. Patt. Recovery requirements of branch prediction storage structures in the presence of mispredicted-path execution. Int'l J. Parallel Programming, 25(5):363-83, Oct. 1997.
[6]T-Y Yeh, D. T. Marr, Y. N. Patt. Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache, "The 7th ACM International Conference on Supercomputing" pp.67 - 76, July 19 - 23, 1993, Tokyo, Japan.
[7]Skadron, K., Martonosi, M., and Clark, D.W. ``Speculative Updates of Local and Global Branch History: A Quantitative Analysis,'' The Journal of Instruction Level Parallelism, vol. 2, Jan. 2000 (http://www.jilp.org/vol2).
[8]T - Y Yeh and Y.N. Patt "A Comprehensive Instruction Fetch Mechanism for a Processor Supporting Speculative Execution," Proc of the 25th International Symposium on Microarchitecture, (Dec. 1992), pp. 129 - 139.
[9]E. Rotenberg, S. Bennett, and J. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. Tech Report 1310, CS Dept., Univ. of Wisc. - Madison, 1996.
[10]D. Burger, T. Austin, and S. Bennett. "Evaluating future microprocessor: The simplescalar too set," Technical Report 1308, University of Wisconsin Madison Technical Report, July 1996.
[11]Charles Price. MIPS IV Instruction Set, revision 3.1. MIPS Technologies, Inc., Mountain View, CA, January 1995.
[12]Gurindar S. Sohi, "Instruction Issue Logic for High-performance, Interruptible, Multiple Functional Unit, " Pipelined Computers, IEEE Transaction on Computers, 39(3): 349 - 359, March 1990.
[13]Kevin Skadron, M. Martonosi, and D.W. Clark, " Selecting a Single, Representative Sample for Accurate Simulation of SPECint Benchmarks " Tech Repor TR-595-99,Princeton Dept. of Computer Science,Jan. 1999.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔