跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/11 19:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:許家榮
研究生(外文):Chia-Jung Hsu
論文名稱:利用虛擬位址壓縮減少高效能處理器之分支目標緩衝器及載入儲存佇列之面積及功率需求
論文名稱(外文):Applying Virtual Address Compression in Branch Target Buffer and Load / Store Queue in high-performance processors
指導教授:陳中和陳中和引用關係
指導教授(外文):Chung-Ho Chen
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電腦與通信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:79
中文關鍵詞:虛擬位址壓縮分支目標緩衝器載入儲存佇列功率消耗
外文關鍵詞:energy reductionBTBbranch target bufferload store queueLSQvirtual address compression
相關次數:
  • 被引用被引用:0
  • 點閱點閱:153
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
在本論文中針對高效能處理器中的分支目標暫存器(Branch Target Buffer-BTB)以及載入儲存佇列(Load / Store Queue-LSQ)所儲存及比對的虛擬位址做壓縮處理,因為BTB在處理器中是一個儲存虛擬位址的快取記憶體架構,經過虛擬位址壓縮過後可以節省BTB的面積及功率需求。LSQ在處理器中不僅僅是儲存虛擬位址,還需要利用全體搜尋(fully-associative)的Content-Address-Memory(CAM)架構使用將要被擺置到LSQ的虛擬位址尋找位址碰撞(address collision)的發生,而這樣的架構以及搜尋比對所產生的能量消耗及面積需求的問題都會隨著執行中的(in-flight)指令增加而日益重視。
而使用虛擬位址壓縮的BTB設計可以減少53.6%-69.3%左右的面積需求,而且也可以減少BTB能量消耗4.2%-28.5%左右,不但不會讓原始的時脈週期造成額外的負擔而且Instruction Per Cycle(IPC)只減少0.4%以下。而LSQ的設計經過虛擬位址壓縮過後也可以減少35%-70%左右的面積需求以及39%-72%左右的LSQ能量消耗,在LSQ最後所採用的最佳虛擬位址壓縮設定結果中IPC減少不到0.3%。最後結合BTB和LSQ虛擬位址壓縮的設計可以減少處理器2.5%-3.1%的能量消耗,以及45%-52%的LSQ和BTB面積需求且只有0.2%以下的IPC減少比例。
This paper proposes a virtual address compression technique for branch target buffer (BTB) and load/store queue (LSQ) that use virtual address for matching or comparisons. Since a BTB is a large address cache, applying address compression will reduce the area cost of the BTB. A load/store queue (LSQ) typically needs a fully-associative CAM structure to search the address for matching and consequently poses scalability challenges for power consumption and area cost once the number of the in-flight instructions is raised. Using the proposed approach, the BTB design is able to reduce the area usage by 53.6%-69.3% and energy consumption by 4.2%-28.5% while the LSQ can reduce the area cost by 35%-70% and energy consumption by 39%-72%. The experiment on combining the two shows that 45%-52%total area saving of the two components are achieved while providing 2.5%-3.1% overall processor energy reduction and causing only 0.2% performance loss.
摘要 IV
目錄 VI
圖示索引 VIII
表格索引 X
CHAPTER 1. 序論 1
1.1 研究動機 1
1.2 研究貢獻 2
1.3 內容編排 2
CHAPTER 2. 背景知識 3
2.1. 記憶體位址定址 3
2.1.1. 虛擬位址空間 3
2.1.2. 位址連結 (Address Binding) 4
2.1.3. 虛擬位址與實體位址的映對關係 6
2.2. 分支目標緩衝器 (Branch Target Buffer-BTB) 6
2.3. 載入與儲存佇列 (Load / Store Queue-LSQ) 7
CHAPTER 3. 相關文獻 9
CHAPTER 4. 虛擬位址壓縮 12
4.1. 經過位址壓縮過後的BTB架構 12
4.2. 經過位址壓縮過後的LSQ架構 15
CHAPTER 5. 虛擬位址壓縮的設計與實作 18
5.1. 指令位址樣本表格 (Instruction Address Pattern Table -IAPT) 18
5.2. 資料位址樣本表格 (Data Address Pattern Table-DAPT) 21
5.3. IAPT和DAPT之間的差別討論 24
5.4. 經過虛擬位址壓縮過後的處理器管線架構 26
5.4.1. 指令位址壓縮處理 27
5.4.2. 資料位址壓縮處理 28
5.5. 資料位址壓縮的程式死結處理 29
5.6. DAPT的位址樣本計數器恢復處理 30
CHAPTER 6. 模擬驗證與分析 33
6.1. 模擬環境設定 33
6.2. 標準測試程式與位址壓縮組態設定 34
6.3. 功率消耗模組 36
6.3.1. CAM (Content Address Memory)的功率消耗模組 37
6.3.2. IAPT和DAPT的功率消耗模組 39
6.3.3. 經過位址壓縮過後BTB和LSQ的功率消耗模組 39
6.4. 模擬驗證結果 42
6.4.1. 指令位址壓縮結果 42
6.4.2. 資料位址壓縮結果 55
6.4.3. 結合虛擬位址壓縮處理的結果 67
6.5. 結果比較 70
6.5.1. 減少LSQ的比對搜尋次數 71
6.5.2. LSQ相關性集合架構 73
CHAPTER 7. 結論與未來發展 76
7.1. 結論 76
7.2. 未來發展 76
REFERENCES 78
[1]A. Park and M. K. Farrens, “Address Compression through Base Register Caching,” in Proceedings of the Annul IEEE/ACM International Symposium on Microarchitecture,1990 , pp.193-199.
[2]D. Citron and L. Rudolph, “Creating a Wider Bus Using Caching Techniques,” in Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture, 1995, pp.90-99.
[3]S. Palacharla, N. P. Jouppi, and J. E. Smith, “Quantifying the Complexity of Superscalar Processors,” University of Wisconsin-Madison, Tech. Rep. CS-1328, May 1997.
[4]D. Burger and T. M. Austin, “The SimpleScalar tool set, version 2.0”, in University of Wisconsin-Madison, Jun. 1997, CS-1342.
[5]D. Brooks, V. Tiwari, and M. Martonosi, “Wattch: A framework for architectural-level power analysis and optimizations,” in Proceedings on the 27th Annual International Symposium on Computer Architecture, 2000, pp.83-94.
[6]G. Reinman and N. P. Jouppi, “CACTI 2.0: An Integrated Cache Timing and Power Model,” COMPAQ Western Research Lab, Palo Alto, CA, Tech. Rep., Feb. 2000.
[7]J. L. Henning, “SPEC CPU2000: Measuring CPU performance in the new millennium,” IEEE Computer, Vol: 33, 2000, pp.28-35
[8]L. Villa, M. Zhang, and K. Asanovic, “Dynamic Zero Compression for Cache Energy Reduction,” in Proceedings of the 33rd International Symposium on Microarchitecture, Dec.2000
[9]R. Canal, A. González, and J. E. Smith, “Very low power pipelines using significance compression,” in Proceedings of the 33rd Annual ACM/IEEE international Symposium on Microarchitecture (Monterey, California, United States). MICRO 33. ACM Press, New York , 2000, pp.181-190
[10]Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose, “Power Reduction in Superscalar Datapaths Through Dynamic Bit-Slice Activation,” Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01), 2001, pp.0016
[11]I. Park, C. L. Ooi, and T. N. Vijaykumar, “Reducing Design Complexity of the Load/Store Queue,” in Proceedings of the 36th Annul IEEE/ACM International Symposium on Microarchitecture, 2003, pp.411-422.
[12]S. Sethumadhavan, R. Desikan, D. Burger. C. R. Moore, and S. W. Keckler, “Scalable Hardware Memory Disambiguation for High ILP Processors,” in Proceedings of the 36th Annul IEEE/ACM International Symposium on Microarchitecture, 2003, pp.188-127.
[13]H. W. Cain and M. H. Lipasti, “Memory Ordering: A Value-Based Approach,” in Proceedings on the 31st Annual International Symposium on Computer Architecture, 2004, pp.90-101.
[14]J. Liu, K. Sundaresan and N. R. Mahapatra, “Dynamic Address Compression Schemes: A Performance, Energy, and Cost Study,” in Proceedings of the IEEE International Conference on Computer Design, 2004, pp.458-463.
[15]R. Gonzalez, A. Critstal, D. Ortega, A. Veidembaum, and M. Valero, “A content aware integer register file organization,” in 31st Annual International Symposium on Computer Architecture, 2004, pp.314-324.
[16]Ramon Canal, Antonio González and James E. Smith, “Value Compresson for Efficient Computation”, European Conference on Parallel Computing (Europar'05), Lisboa (Portugal); Lecture Notes in Computer Science, August 2005, pp. 519-529
[17]Abella and A. González, “SAMIE-LSQ: Set-Associative Multiple-Instruction Entry Load/Store Queue,” in 20th IEEE International Parallel and Distributed Processing Symposium, 2006.
[18]L. Baugh and C. Zilles, “Decomposing the Load-Store Queue by Function for Power Reduction and Scalability,” in IBM 2006 Journal of Research and Development in Computers & Technology, 2006, pp.287- 297.
[19]F. Castro, D. Chaver, L. Pinuel, M. Prieto, M .C. Huang, and F. Tirado, “LSQ: a power efficient and scalable implementation,” in IEE proceedings Computers and digital Techniques, 2006, pp.389-398.
[20]Kostas Pagiamtzis, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” in IEEE Journal of Solid-State Circuits, 2006, pp.712-727.
[21]J. Liu, K. Sundaresan, and N. R. Mahapatra , “A Fast Dynamic Compression scheme for Low-Latency On-Chip Address Buses,” in the 18th International Conference on Microelectronics, 2006.
[22]O. Rochecouste, G. Pokam, and A. Seznec, “A case for a complexity-effective, width-partitioned Microarchitecture,” in ACM Trans Archit. Code Optim, 2006, pp.295-326
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top