跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/09 03:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:郭婷吟
研究生(外文):Ting-yin Kuo
論文名稱:使用晶片網路輔助之儲存器借用機制以增進叢聚式多核心系統之非一致性快取記憶體架構效能
論文名稱(外文):Enhancing Performance of NUCA for Cluster-based Many-Core System Using Bank Loan Mechanism Assisted by Network-on-Chip
指導教授:張貴忠
指導教授(外文):Kuei-Chung Chang
學位類別:碩士
校院名稱:逢甲大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:32
中文關鍵詞:快取區塊晶片多處理器系統非單一快取記憶體架構晶片內網路借用策略
外文關鍵詞:loan mechanismcache bankChip MultiprocessorsNUCANetwork-on-chip
相關次數:
  • 被引用被引用:0
  • 點閱點閱:233
  • 評分評分:
  • 下載下載:3
  • 收藏至我的研究室書目清單書目收藏:0
在晶片多處理器系統(Chip Multiprocessors , CMPs)中,處理器及記憶體之間執行速度的巨大差距導致最後一階的快取記憶體之效能成為設計的關鍵。在許多嵌入式應用程式中,非單一快取記憶體架構(Non-Uniform Cache Architecture, NUCA)已被用來解決晶片多處理器系統資料存取效能的限制。這種設計方法將網路的概念應用至快取記憶體中,可以縮短每個核心存取記憶體資料的平均時間。
利用晶片多處理器的特性,可在此架構上放置多個應用程式執行,非單一快取記憶體架構搭配晶片內網路(Network-On-Chip, NOC)設計架構使得應用程式在執行時能夠進一步的提升執行效能。然而在NUCA架構下,對於應用程式使用快取記憶體的分配上雖然已有多篇論文研究,但單一應用程式在晶片多處理系統下對於快取記憶體的資源分配卻是少之又少,故本篇論文著重於單一應用程式下,如何使得應用程式內多執行緒工作的快取記憶體使用率提高,藉以增加應用程式之效能。
在本篇論文中,我們假設單晶片多處理器中每一個核心執行了應用程式中的某一個執行緒功能,我我們提出的方法中對於自己快取空間不足的核心可以借用較少使用快取記憶體核心之快取空間,同時我們將針對同一群組中的所有快取記憶體位址進行錯位計算,以避免借用到的位置也是資料熱區。而經由以上機制使得整體應用程式的執行效能提升。
In the chip multiprocessors, huge execution performance gap between the processor and memory leads to the design of the last-level cache becoming one of the key design issues. The Non-Uniform Cache Architecture (NUCA) has been used to solve the limitation of data access performance of CMPs in many embedded applications.
In chip multiprocessor SoCs, multiple applications can be processed simultaneously on this architecture. The NUCA cache architecture implemented by Network-on-Chip can further improve the data access performance for multi-core SoC.
Although there are many researches proposed the cache partitioning for multiple applications in the NUCA system, but the resource allocation and management for a single application in chip multi-processor system is rarely proposed.
In this thesis, we propose a bank loaning mechanism to reduce the cache miss rate by the mechanism of borrowing the cache banks from other cores in the same cluster. In addition, we will shift the set index of each cache bank in the same cluster to stagger the cache data locations. The proposed mechanism can reduce the cache miss rate and enhance the NUCA performance.
目錄
誌謝 i
摘要 ii
Abstract iii
圖目錄 v
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 3
1.3 論文架構 5
第二章 相關文獻回顧 6
2.1 多核心系統簡介 6
2.2 晶片內網路(Network-On-Chip, NOC) 7
2.3 非單一快取記憶體架構(NUCA) 9
2.4 快取空間分配設計 11
第三章 提出的方法 12
3.1 NoC-based NUCA架構 12
3.2 核心分群組策略 13
3.3 快取區塊借用機制 16
第四章 實驗結果 21
4.1 實驗環境 21
4.2 模擬結果- Miss Rate 22
4.3 模擬結果 - Execution Time 24
4.4 模擬結果-快取大小 26
第五章 結論 28
Reference 29
Reference
[1]Jeff Parkhurst, John Darringer, Bill Grundmann, “From Single Core to Multi-Core Preparing for a new exponential,” Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp. 67-72, November 2006.
[2]Eshel Haritan, Hiroyuki Yagi, Wayne Wolf, Toshihiro Hattori, Pierre Paulin, Achim Nohl, Drew Wingard, Mike Muller, “Multicore Design is the challenge! what is the solution?,” Proceedings of the 45th annual Design Automation Conference, pp. 128-130, June 2008.
[3]Lei Chai, Qi Gao, Panda, D.K., “Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System,“ Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, pp. 471-478, May 2007.
[4]T. Trawick, “Multicore communication: today and the future,” Embedded Computing Design, March 2007.
[5]Baojun Qiao, Feng Shi, Weixing Ji, “A new Hierarchical Interconnection Network for Multi-core Processor,” Proceedings of the 2nd IEEE Conference on Industrial Electronics and Applications, pp. 246-250, May 2007.
[6]Jesshope CR, Miller PR, Yantchev JT, “High Performance Communications in Processor Networks, Computer Architecture,” Proceedings of the 16th Annual International Symposium on Computer Architecture, pp. 150-157, June 1989.
[7]Dietmar Tutsch and Gunter Hommel, “High Performance Low Cost Multicore NoC Architectures for Embedded Systems,” Proceedings of the International Workshop on Embedded Systems-Modeling, Technology and Applications, pp. 53-62, 2006.
[8]Partha Pratim Pande, Cristian Grecu, Michael Jones, Andre’ Ivanov, and Resve Saleh, “Performance Evaluation and Design Trade-Offs For Network-on-Chip Interconnect Architectures,” IEEE Transactions on Computers, Vol. 54, Issue 8, pp. 1025-1040, August 2005.
[9]Henrique C. Freitas and Philippe O. A. Navaux , “A high throughput multi cluster noc architecture,” Proceedings of the IEEE 11th International Conference on Computational Science and Engineering, pp. 56-63, July 2008.
[10]Lei Chai, Albert Hartono, Dhabaleswar K. Panda, “Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters,” Proceedings of the IEEE International Conference on Cluster Computing, pp. 1-10, September 2006.
[11]Marek Tudruj, Lukasz Masko, “Dynamic SMP Clusters with Communication on the Fly in NoC Technology for Very Fine, Parallel and Distributed Computing,” Proceedings of the 3rd International Symposium on Parallel and Distributed Computing, pp. 97-104, July 2004.
[12]Reetuparna Das, Asit K. Mishra, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Ravishankar Iyer, Mazin S. Yousif, Chita R. Das, “Performance and Power Optimization through Data Compression in Network-on-Chip Architectures,” Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture, pp. 215-225, February 2008.
[13]Hsiang-Ning Liu, Yu-Jen Huang, Jin-Fu Li, “Memory built-in self test in multicore chips with mesh-based networks,” IEEE Micro, Vol. 29, Issue 5, pp. 46-55, September 2009.
[14]D. Chandra, F. Guo, S. Kim and Y. Solihin, “Predicting inter-thread cache contention on a chip multiprocessor architecture,” Proceedings of the 11th International Symposium on High Performance Computer Architecture, pp. 340 - 351, February 2005
[15]L. Hsu, S. Reinhardt, R. Iyer and S. Makineni, “Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource,” Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pp.13-22, September 2006
[16]R. Iyer, “CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms,” Proceedings of the 18th annual international conference on Supercomputing, pp. 257 - 266 , July 2004
[17]C. Kim, D. Burger, S. W. Keckler, “Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches,” IEEE Micro, Vol. 23, Issue 6, pp. 99-107, November 2003
[18]Fei Guo, Hari Kannan, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, and Christos Kozyrakis, “From Chaos to QoS: Case Studies in CMP Resource Management,” SIGARCH Computer Architecture News, Vol. 35, pp. 21-30, March 2007
[19]S. Kim, D. Chandra, and Y. Solihin, “Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture,” Proceedings of 13th International Conference on Parallel Architecture and Compilation Techniques , pp. 111 - 122, October 2004
[20]C. Liu, A. Sivasubramaniam, M. Kandemir, “Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs,” Proceedings of the 10th International Symposium on High Performance Computer Architecture, pp.176 - 185, February 2004
[21]Yang Ding, Mahmut Kandemir, Padma Raghavan, Mary Jane Irwin, “A helper thread based edp reduction scheme for adapting application execution in cmps,” Proceedings of IEEE International Symposium on Parallel and Distributed Processing, pp.1-14, April 2008
[22]J. Chang and G. S. Sohi, “Cooperative cache partitioning for chip multiprocessors,” Proceedings of the 21st annual international conference on Supercomputing, pp. 242 - 252, June 2007
[23]Fei Guo, Yan Solihin, Li Zhao, Ravishankar Iyer, “A framework for providing quality of service in chip multi-processors,” Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 343-355, December 2007
[24]Bong-Jun Ko, Kang-Won Lee, Khalil Amiri, Seraphin Calo, “Scalable service differentiation in a shared storage cache,” Proceedings of the 23rd International Conference on Distributed Computing Systems, pp.184-193, May 2003.
[25]M. K. Qureshi and Y. N. Patt, “Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches,” Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 423-432, December 2006
[26]Nauman Rafique, Won-Taek Lim, Mithuna Thottethodi, “Architectural support for operating system-driven CMP cache management,” Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pp.2-12, September 2006.
[27]G. E. Suh, L. Rudolph, S. Devadas, “Dynamic partitioning of shared cache memory,” The Journal of Supercomputing, Vol. 28, Issue 1, April 2004
[28]Kagi, A., Goodman J.R., Burger, D., “Memory bandwidth limitations of future microprocessors,” Proceedings of the 23rd annual international symposium on Computer architecture, pp.78-89, May 1996
[29]D. Kaseridis, J. Stuecheli, J. Chen, and L.K. John, “A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large cmp systems,” Proceedings of the 16th International Symposium on In High Performance Computer Architecture, pp.1-11, January 2010.
[30]F. Liu, X. Jiang, and Y. Solihin, “Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance,” Proceedings of the 16th International Symposium on In High Performance Computer Architecture, pp.1-12, January 2010.
[31]Coutinho L. M., Mendes J. L., Martins C. A., “Dynamically Reconfigurable Split Cache Architecture,” Proceedings of the 2008 International Conference on Reconfigurable Computing and FPGAs, pp.163-168, December 2008
[32]M. B. Carvalho, L. F. W. Goes and C.A.P.S. Martins, “Dynamically reconfigurable cache architecture using adaptive block allocation policy,” Proceedings of the 20th international conference on Parallel and distributed processing Symposium, pp. 25-29, April 2006
[33]Kaseridis, D., Stuecheli, J., John, L.K., “Bank-aware Dynamic Cache Partitioning for Multicore Architectures,” Proceedings of the 2009 International Conference on Parallel Processing, pp.18-25, September 2009
[34]M.T. Kandemir, T. Yemliha, and E. Kultursay, “A helper thread based dynamic cache partitioning scheme for multithreaded applications,” Proceedings of the 48th Design Automation Conference, pp.954-959, June 2011
[35]Sai Prashanth Muralidhara, Mahmut Kandemir, Padma Raghavan, “Intra-application shared cache partitioning for multithreaded applications,” ACM SIGPLAN Notices, Vol. 45, Issue 5, May 2010
[36]http://www.windriver.com/products/simics/, April 2012
[37]S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 programs: Characterization and methodological considerations,” ACM SIGARCH Computer Architecture News - Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture, Vol. 23, Issue 2, June 1995
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top