跳到主要內容

臺灣博碩士論文加值系統

(54.92.164.9) 您好!臺灣時間:2022/01/23 05:07
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張彥中
研究生(外文):Nelson Yen-Chung Chang
論文名稱:系統晶片中快取記憶體預取技術及排線橋設計
論文名稱(外文):Cache Prefetch Techniques and Bus Bridge Design in SOC
指導教授:任建葳任建葳引用關係
指導教授(外文):Chein-Wei Jen
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電子工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:97
中文關鍵詞:快取記憶體預取系統排線時距
外文關鍵詞:CachePrefetchingBridgeSystem BusTime Stride
相關次數:
  • 被引用被引用:0
  • 點閱點閱:206
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
快取記憶體預取長久以來就被用來減少快取記憶體中的失誤率,及減少處理器所感受到的記憶體存取延遲。快取記憶體預取因此可以達到讓較小的快取記憶體有等同較大容量快取記憶體一般的表現,進而減少快取記憶體大小。
然而雖然快取記憶體預取可以減少快取失誤率,但其預取的記憶體存取需求卻會增加整體系統排線上的資料流量。而這增加的排線資料流量在使用共用系統排線架構的箝入式系統,卻會造成系統排線上的壅塞。如此雖有可能藉快取預取降低失誤率,但整體系統的表現卻因資料無法順利藉排線傳輸而降低。
本論文根據時脈準確的模擬,分析系統中重要參數對快取預取和整體效能的影響,提出一套運用存取時間資訊的快取預取技巧及一個可對存取需求做排序的排線橋,以解決快取預取在排線壅塞所遇到的問題和影響。根據模擬的結果,本論文提出的快取預取技巧配合可排序排線橋,在預設的參數和背景下,平均可較無快取預取系統在平均資料讀取時間減低8.8%,並降低90%的快取失誤率。

Cache prefetching has long been known in reducing cache miss rate, and in hiding memory access latencies seen by the processor in a processor-based system. This provides a chance to implement a smaller cache with prefetch mechanism to achieve same miss rate with larger cache without prefetching, hence reducing the cache hardware cost.
Though reducing the miss rate improves the performance of a cache, the extra prefetch memory requests increases the overall system bus traffics. The increased bus traffic sometimes diminishes the overall performance of a system, even with the reduced miss rate. In an embedded SOC system, there are more devices that access through the shared system bus. Therefore the heavy traffic of the system bus will limit the benefits of applying cache prefetching techniques to an embedded system. Since the hardware prefetching approach takes the advantage of run-time information, and can take the system bus status into consideration, it is more suitable for embedded systems with multiple master devices.
In this thesis, we investigate the characteristics of several hardware cache prefetching techniques. Then we proposed a new cache prefetching named reference time stride prefetch (RTSP) scheme incorporating access timing information, and a system bus bridge design with access reordering for the processor to solve the bus congestion problem.
The effect of each relevant parameters and how the prefetching affects an embedded system are revealed by running cycle-by-cycle trace-driven simulations of an embedded system model with an ARM7TDMI core and AHB system bus. The simulation result shows that RTSP can reduce 8.8% of average data reference time and more than 90% of data miss rate compared with an unprefetched system.

CONTENTS III
LIST OF FIGURES V
LIST OF TABLES VIII
CHAPTER 1 INTRODUCTION 1
1.1 Introduction To Data Cache Prefetch 1
1.2 Cache Prefetch Issues in Embedded System 2
1.3 Motivation 3
1.4 Thesis Organization 5
CHAPTER 2 DYNAMIC CACHE PREFETCH TECHNIQUES 7
2.1 Overview of Dynamic Cache Prefetch Techniques 7
2.2 Various Dynamic Prefetch Scheme Design 7
2.2.1 One Block Lookahead Prefetch 7
2.2.2 Neighbor Prefetch 9
2.2.3 Baseline SPT Prefetch 11
2.2.4 RPT Prefetch 12
2.2.5 Conflict Miss Time-Stride Prefetch 14
2.3 Reference Time-Stride Prefetch 16
CHAPTER 3 ARCHITECTURE DESIGN 20
3.1 Overview of the Architecture Platform 20
3.2 Baseline System Environment 21
3.2.1 Processor Model 21
3.2.1.1 ARM7TDMI Core 21
3.2.2.2 Separate Instruction/Data Cache 22
3.2.2.3 Baseline Bridge 22
3.2.2 IP Model 24
3.2.3 AHB Bus 26
3.2.4 Main Memory Model 27
3.3 The Reference Time-Stride Prefetch Scheme 27
3.3.1 The Prefethcer 27
3.3.1.1 Reference Time-Stride 28
3.3.1.2 Multiple Prefetch Requests 30
3.3.1.3 Lifetime Computation 31
3.3.2 The Processor Bus Bridge With Access Reordering 32
3.3.2.1 Memory Request Queue (MRQ) 32
3.3.2.2 Prefetch Read Queue (PRQ) 33
3.4 Hardware Cost Estimation 34
CHAPTER 4 RESULTS & ANALYSIS 36
4.1 Verification Environment 36
4.2 The Evaluation Metrics 37
4.2.1 Overall Performance 38
4.2.2 Prefetch Scheme Performance 39
4.2.3 Bus Traffic Evaluation 39
4.3 The Simulation Results 39
4.3.1 Benchmarks 40
4.3.1.1 FDCT 40
4.3.1.2 DWT 41
4.3.1.3 Motion Estimation 43
4.3.2 Default Simulation System Parameter Specifications 44
4.3.3 Cache Size 46
4.3.4 Memory Access Latency 50
4.3.5 IP Model Access Characteristics 60
4.3.5.1 IP Model Enable Probability 60
4.3.5.2 Number of Accesses of the IP Model 70
4.3.6 RPT Size 76
4.3.7 MRQ Size 78
4.3.8 Summary 81
CHAPTER 5 CONCLUSION & FUTURE WORK 82
REFERENCE 84

[1] A. J. Smith, “Cache Memories,” ACM Computing Surveys, vol. 14, no. 3, pp.473-530, September 1982.
[2] J. E. Smith, “Decoupled Access/Execute Computer Architectures,” ACM Trans. On Computer Systems, vol. 2, no. 4, pp.289-308, November 1984.
[3] R. L. Lee, P. C. Yew, D. H. Lawrie, “Multiprocessor Cache Design Considerations,” Proc. 14th Int’l Symp. Computer Architecture, pp.253-262, June 1987.
[4] N. P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” Proc. 17th . Int’l Symp. Computer Architecture, pp.364-373, May 1990.
[5] D. Callahan, K. Kennedy, A. Potterfield, “Software Prefetching,” Proc. Of 4th Symp. On Architectural Support for Programming Languages and Operating Systems, April 1991.
[6] J. W. C. Fu, J. H. Patel, “Data Prefetching in Multiprocessor Vector Cache Memories,” Proc. Of the 18th Int’l Symp. on Computer Architecture, pp.54-63, May 1991.
[7] T. F. Chen, J. L. Baer, “An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty,” Proc. Of Supercomputing ’91, pp.176-186, November 1991.
[8] J. W. C. Fu, J. H. Patel, “Stride directed Prefetching in Scalar Processors,” Proc. Of the 25th Int’l Symp. on Microarchitecture, pp.102-110, December 1992.
[9] I. Sklenar, “Prefetch Unit for Vector Operations on scalar computers,” ACM Computer Architecture News, vol. 20, no. 4, September 1992.
[10] D. M. Tullsen, S. J. Eggers, “Limitations of Cache Prefetching on a Bus-Based Multiprocessor,” Proc. 20th Int’l Symp. on Computer Architecture,pp.278-288, May 1993.
[11] S. Palacharla, R. E. Kessler, “Evaluating Stream Buffers as a Secondary Cache Replacement,” Proc. Of the 21st Int’l Symp on Computer Architecture, pp.24-33, April 1994.
[12] T. F. Chen, J. L. Baer, “Effective Hardware-Based Data Prefetching for High-Performance Processors,” IEEE Trans. Computers, vol. 44, no. 5, pp.609-623, May 1995.
[13] S. Kim, A. V. Veidenbaum, “Stride-Directed Prefetching for Secondary Caches,” Proc. of the ’97 Int’l Conference on Parallel Processing , p.314, August 1997.
[14] J. Tse, A. J. Smith, “CPU Cache Prefetching: Timing Evaluation of Hardware Implementations,” IEEE Trans. Computers, vol. 47, no. 5, pp. 509-526, May 1998.
[15] R. Cucchiara, M. Piccardi, A. Prati, “Exploiting Cache in Multimedia,” IEEE Conf. Multimedia Computing Systems ’99, pp.345-350, June 1999.
[16] W. Tan, A.Veindenbaum, A. Nicolau, R. Gupta. “Conflict Miss Elimination by Time-stride Prefetch,” technical report, Information and Computer Science Dept., Univ. of California, Irvine, March 2000.
[17] D. F. Zucker, R. B. Lee, M. J. Flynn, “Hardware and Software Cache Prefetching Techniques for MPEG Benchmarks,” IEEE Trans. Circuits & Systems for Video Technology, vol. 10, no. 5, pp.782-789, August 2000.
[18] R. Cucchiara, , M. Piccardi, A. Prati, “Hardware Prefetching Techniques for Cache Memories in Multimedia Applications,” Proc. Of Int’l Workshop on Computer Architectures for Machine Perception(Camp 2000), pp.311-319, 2000.
[19] S. A. McKee, W. A. Wulf, J.H. Aylor, R. H. Klenke, M. H. Salinas, S. I. Hong, D. A. B. Weikle, “Dynamic Access Ordering for Streamed Computations,” IEEE Trans. Computers, vol. 49, no. 11, pp.1255-1271, November 2000.
[20] R. Cucchiara, M. Piccardi, A. Prati, “Temporal Analysis of Cache Prefetching Strategies for Multimedia Applications,” IEEE Int’l Conference on Performance, Computing, and Communications, pp.311-318, April 2001.
[21] P. Reungsang, S. K. Park, S. W. Jeong, H. L. Roh, G. Lee, “Reducing Cache Pollution of Prefetching in Small data cache,” Proc. of Int’l Conference on Computer Design ’01,pp.530-533, September 2001.
[22] ARM Ltd., “AMBA 2.0 Specification,” http://www.arm.com/, 1999.
[23] ARM Ltd., “ARM7TDMI Technical Reference Rev 3,” http://www.arm.com/, 2001.
[24] ARM Ltd, “ARM Development Suite (ADS) version 1.2,” http://www.arm.com/, 2000.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top