跳到主要內容

臺灣博碩士論文加值系統

(35.172.136.29) 您好!臺灣時間:2021/07/26 20:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:郭良宇
研究生(外文):Liang-Yu Kuo
論文名稱:多核心系統之雙階層快取記憶體之設計
論文名稱(外文):A two-level cache design for multi-core system
指導教授:陳中和陳中和引用關係
指導教授(外文):Chung-Ho Chen
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電腦與通信工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:英文
論文頁數:43
中文關鍵詞:資料一致性多核心系統快取記憶體
外文關鍵詞:multi-core systemcoherence cache
相關次數:
  • 被引用被引用:0
  • 點閱點閱:292
  • 評分評分:
  • 下載下載:53
  • 收藏至我的研究室書目清單書目收藏:0
隨著計算能力的需求越來越大,單核心處理器的效能在技術上已經遇到提升的瓶頸,多核心系統成為提升效能最適宜的方法。在使用共享記憶體的多核心系統中,快取記憶體扮演一個重要的角色,其要透過溝通的協定控制各處理器之間的溝通及保證資料的一致性,並需兼具降低處理器存取記憶體延遲時間的功能。本研究基於一個有9級管線的類似ARM 超純量處理器:Symphony32 中實作了一個二階層快取記憶體。本研究中,第二階層快取記憶體被當作共享記憶體,如此可以有效的減少在系統匯流排上平均49.56%的負荷,並於分析了在多核心系統中私有快取記憶體之間連接方式與溝通協定的關係後,對快取記憶體及連接管道的效能做優化。在針對快取記憶體優化的部分使用雙埠來做存取,並讓效能平均增進約10.24%;此外,我們將資料管道的寬度增至與快取記憶體的區塊大小同寬,讓共享資料能快速的透過此資料管道在各個快取記憶體中轉移,減少多執行緒的程式在多核心中執行時等待修改過的共享資料回存的時間,並利用此項特性使得一般資料存取的效能平均增進約8.59%。
While the requirement of computing power becomes larger and larger, multi-core systems have recently become the most appropriate architecture to solve this problem because improving the performance of a single core system has run into a bottleneck. Cache memories play an important role in a shared-memory multi-core system since using private cache needs to ensure the consistence of shared data through coherence protocol. A two-level cache system for Symphony32 which is an ARM-like superscalar processor with 9-stage pipeline is implemented in this study. In our design, the L2 cache is treated as a shared cache, and reduces the traffics on system bus by 49.56% on average. After analyzing the relationship between the interconnection and cache coherence protocol, a two-level cache design that enhances the performance of the cache and the interconnection is presented. To enhance the cache, a dual-ported data cache is used to improve the performance by 10.24%. Moreover, the width of data channel is increased as large as the line size of the cache for transferring the data between caches rapidly, and this improves the performance of data accessing by 8.59%.
摘要 IV
Abstract V
Contents VI
Chapter 1 -Introduction 1
1.1 Motivation 1
1.2 Contribution 2
1.3 Organization of the thesis 2
Chapter 2 -Background and related work 4
2.1 Interconnection network topology 4
2.2 Cache coherency problem 5
2.2.1 Directory-based protocol 6
2.2.2 Token-based protocol 7
2.2.3 Snooping-based protocol 8
2.3 Multi-level cache coherence 9
2.4 Related work 10
Chapter 3 -System evaluation and implementation 13
3.1 Interconnection topology and coherence protocol 13
3.2 Multilevel cache organization 14
3.3 Coherence bus design 15
3.4 MESI protocol 18
3.5 Design of dual-ported cache 21
3.6 Summary 23
Chapter 4 -Verification environment and method 24
4.1 Hardware platform 24
4.2 Software platform and benchmarks 25
4.2.1 Software tool chain 25
4.2.2 Benchmarks 26
4.2.2.1 Benchmarks of single-core system 26
4.2.2.2 Benchmarks of multi-core system 27
Chapter 5 -Experiment results 32
5.1 Performance measurement result 32
5.2 Synthesis result 38
Chapter 6 -Conclusion and future work 40
6.1 Conclusion 40
6.2 Future work 40
Reference 41
[1]M. R. Marty and et al., “Improving Multiple-CMP Systems Using Token Coherence,” Proceedings of the 11th int’l symposium on High-performance Computer Architecture (HPCA-11 2005).
[2]M. M. K. Martin, Mark D. Hill, and David A. Wood, “Token Coherence: Decoupling Performance and Correctness,” Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03).
[3]M. M. K. Martin and et al., “Timestamp Snooping: An approach for Extending SMPs,” Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS-IX), Cambridge, Massachusetts, November 13-15, 2000.
[4]J.-L. Baer and W.-H. Wang, “ON THE INCLUSION PROPERTIES FOR MULTI-LEVEL CACHE HIERARCHIES,” Proceedings of the 15th International Symposium on Computer Architecture, 1988, pages 73-80.
[5]M. R. Marty and et al., “Improving Multiple-CMP Syatems Using Token Coherence,” Proceedings of the 11th Intl Symposium on High-Performance Computer Architecture (HPCA-11 2005).
[6]S. C. Woo and et al., “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, Santa Margherita Ligure, Italy, June 1995.
[7]Mibench test bench, http://www.eecs.umich.edu/mibench/.
[8]Home page for the Stanford Parallel Applications for Shared Memory (SPLASH), http://www-flash.stanford.edu/apps/SPLASH/.
[9]D. Anderson and T. Shanley, “Pentium Processor System, Architecture,” Addison- Wesley Publishing Company, 1995.
[10]C. K. TANG, “Cache system design in the tightly coupled multiprocessor system,” Proceedings of the National Computer Conference, 1976, pages 749-753.
[11]D. Chaiken and et al., “Directory-Based Cache Coherence in Large-Scale Multiprocessors,” IEEE Computer, Vol. 23, No.6, June 1990, pp. 49-59.
[12]A. Silberschatz, P. B. Galvin, and G. Gagne, “Operating System Concepts (7th Edition),” Published by John Wiley & Sons, December 1, 2004.
[13]A. S. Tanenbaum, “Modern Operating Systems (3rd Edition),” Published by Prentice Hall, 2007.
[14]V. Salapura and et al., “Design and implementation of the Blue Gene/P snoop filter,” in 14th International Symposium on High-Performance Computer Architecture, February 2008.
[15]K. Strauss, X. Shen, and J. Torrellas, “Flexile Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors,” Proceedings of the 33rd International Symposium on Computer Architecture (ISCA' 06).
[16]M. Tomasevk and V. Milutinovic, “Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors,” IEEE micro, December 1994, P52-66.
[17]C.-C. Wang, “Design and Implementation of a Dual-ISA Embedded Microprocessor,” Thesis for Master of science, Institute of Computer and Communication Engineering, National Cheng Kung University.
[18]H.-W. Gao, “Embedded Processor Verification using Particular Characteristics of Linux Operating System,” Thesis for Master of Science, Institute of Computer and Communication Engineering, National Cheng Kung University.
[19]J.-W. Lin, “Design, Analysis, and Implementation of a Parameter-based Out-of-order Superscalar Microprocessor Conforming to ESL Methodology,” Thesis for Master of Science, Institute of Computer and Communication Engineering, National Cheng Kung University.
[20]L. Gwennap, “Alpha 21364 to Ease Memory Bottleneck,” Microprocessor Report, Oct. 1998.
[21]GNU M4 documentation, http://www.gnu.org/software/m4/manual/.
[22]AMD64 Architecture Programmer's Manual Vol 2 'System Programming'.
[23]B.-f. QIAN, L.-M. YAN, “The Research of the Inclusive Cache used in Multi-Core Processor,” 2008 International Conference on Electronic Packaging Technology & High Density Packaging (ICEPT-HDP 2008).
[24]L. Seiler and et al., “Larrabee: A Many-Core x86 Architecture for Visual Computing,” ACM Transaction on Graphics, Vol.27, No.3, Article 18, Publication date: August 2008.
[25]ARM Corporation, “AMBA™ Specification (Rev 2.0)”.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top