跳到主要內容

臺灣博碩士論文加值系統

(2600:1f28:365:80b0:879a:e16d:38fe:36d8) 您好!臺灣時間:2024/12/13 07:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳熙君
研究生(外文):Hsi-Chiuen Chen
論文名稱:以快取記憶體分割及陣列映射來減少快取記憶體之碰撞
論文名稱(外文):Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping
指導教授:許健平許健平引用關係曾煜棋曾煜棋引用關係
指導教授(外文):Jang-Ping SheuYu-Chee Tseng
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:中文
論文頁數:32
中文關鍵詞:快取記憶體位址配置重覆使用相同快取記憶體資料使用快取記憶體中連續位址之資料
外文關鍵詞:cache localitiestemporal reusespatial reuseAtom
相關次數:
  • 被引用被引用:0
  • 點閱點閱:178
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
單一處理器的執行效能受到快取(cache)記憶體中資料命中率(hit rates)的影響很大。在許多科學及工程的計算上,常出現因程式行為模式及陣列宣告方式的搭配不當而導致快取記憶體發生碰撞。因此,我們發展出一套方法,對最內層迴圈的存取模式作分析,並改寫程式,使其執行時大量減少快取記憶體的碰撞。我們的方法將適用於含有affine function的任意迴圈程式,根據存取資料量與快取記憶體大小的關係,我們以不同的策略來安排陣列資料在多層快取記憶體的擺放方式,以使此方法的overhead最小,而增進的效能最大。我們以direct-mapped cache 的多層快取記憶體對映方式為研究之基礎,並且以Atom1 模擬軟體為工具,撰寫一個模擬器來模擬direct-mapped cache 的多層快取記憶體對映模式。以我們所研發的方法,將陣列元素安排在快取記憶體的相對位置後,並以模擬器來觀察快取記憶體的執行狀態,實驗結果顯示,快取記憶體之未命中率(miss rate)可大幅的減低,進而縮短程式執行的時間。

This paper presents an algorithm to reduce cache conflicts and improve
cache localities. The proposed algorithm analyzes the reference pattern of
innermost loop, partitions the multi-level cache into several parts with
different size, and then maps array data onto the scheduled cache positions
such that cache conflicts can be eliminated. To reduce the memory overhead for
mapping arrays onto partitioned cache, methods for array redeclaration and
program transformation are also developed. Besides, we combine the loop tiling
and the proposed schemes to improve the cache performance for those programs
that the amount of accessing elements of innermost loop is larger than cache
partitioned size. The developed scheme can be applied to arbitrary loops
programs whose array accessing functions are affine. To demonstrate that our
approach is effective at reducing number of cache conflicts and exploiting
cache localities, we use Atom as
tool to simulate the
behavior of direct-mapped cache. Experimental results show
that applying our cache partition scheme can largely reduce the cache
conflicts and thus save program execution time in both one-level cache
and multi-level cache hierarchies.

1 Introduction
2 Basic Concept
3 Preliminaries and The Algorithms
3.1 One Level Cache Hierarchy
3.2 Multi-Level Cache Hierarchy
3.2.1 The algorithm
4 Related Works
5 Performance Study
6 Conclusion Remark and Future Work

[1] D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler transformations for high-performance computing. Technical Report UCB/CSD-93-781, Computer Science Division, University of California, Berkeley, 1993.
[2] D. F. Bacon et al., A Compiler Framework for Restructuring Data Declarations to enhance Cache and TLB Effectiveness. In CASCON'94, pp. 270-282, Apr. 1994.
[3] K. Hwang and F. A. Briggs. Computer Architecture and Parallel Processing. McGRAW-Hill, Inc. 1984.
[4] M. Lam, E. E. Rothberg, and M. E. Wolf., The Cache Performance of Blocked Algorithms. Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 63-74, Santa Clara Calif., 8-11 Apr. 1991.
[5] A. R. Lebeck and D. A. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, Vol. 27, No. 10, pp. 15-26, 1994.
[6] Jin-Ho Lee, Min-Young Lee, Seong-Uk Choi, and Myong-Soon Park. Reducing Cache Conflicts in Data Cache Prefetching. Computer Architecture News, vol. 22, no. 4, pp. 71-77, 1994.
[7] L. S. Liu, C. W. Ho, and J. P. Sheu. On the parallelism of nested forllps using index shift method. In Proceedings of the International Conference on Parallel Processing, Vol. II, pp. 119-123, Aug. 1990.
[8] Naraig Manjikian, and Tarek S. Abdelrahman. Reduction of cache conflicts in loop nests. Tech. Rep. CSRI-318, Computer Systems Research Institute, University of Toronto, Ontario, Canada, March 1995.
[9] T. Mowry. Tolerating Latency through Software-controlled Data Prefetching. PHD Dissertation, Dept. of Electrical Eng., Stanford Univ., 1994.
[10] P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, Vol. 48, No. 2, Feb. 1999.
[11] S. Przybylski, M. Horowitz, and J.L. Hennessy. Performance Tradeoffs in Cache Design. Proc. 15th Symp. Computer Architecture, pp. 290-298, Honolulu, Hawaii, June 1988.
[12] Gabriel Rivera, Chau-Wen Tesig. Data Transformations for Eliminating Conflict Misses. In Proceedings of the 1998 ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'98), Montreal, Canada, June 1998.
[13] O. Temam, C. Fricker, and W. Jalby. Impact of cache interferences on usual numerical dense loop nests. Proceedings of the IEEE, Vol. 81, No. 8, pp. 1103-1115, 1993.
[14] M.J. Wolfe. Iteration Space Tiling for Memory Hierarchies. Proc. Third SIAM Conf. Parallel Processing for Scientific Computing, pp. 357-361, Los Angeles, 1-4 Dec. 1987.
[15] M. E. Wolf and M. S. LanO. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, pp. 30-44, June 1991.
[16] David C. Wong, Edwar W. Davis, and Jeffrey O. Young. A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems. IEEE Transactions on Parallel and Distributed Systems, vol. 9, no. 6, pp. 601-608, June 1998.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top