(3.236.175.108) 您好!臺灣時間:2021/02/28 03:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊琳淳
研究生(外文):Lin-ChunYong
論文名稱:多核心架構下草稿式記憶體之即時管理
論文名稱(外文):Online Scratchpad Memory Management for Multi-core Systems
指導教授:張大緯
指導教授(外文):Da-Wei Chang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:45
中文關鍵詞:記憶體配置草稿式記憶體多核心
外文關鍵詞:memory allocationscratchpad memorymulticore
相關次數:
  • 被引用被引用:0
  • 點閱點閱:131
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於草稿式記憶體相比於傳統的快取記憶體,耗電量較低、面積也比較小,因此草稿式記憶體在嵌入式系統中越來越常見。在嵌入式系統中,除了草稿式記憶體漸漸的被廣泛使用之外,隨著科技的進步,多核心架構成為大部分嵌入式系統的發展趨勢,因此,已經有研究結合多核心架構與草稿式記憶體來降低耗電量並提高系統的執行速度。在多核心的架構下,多個應用程式是可以同時執行的,因此,若能根據程式對記憶體的需求,妥善的分配記憶體資源給程式使用,便可以提高記憶體的使用率來降低系統耗能。在現有的研究中,為了得知程式對記憶體的需求量,必須在程式執行前便分析程式的行為,再根據這些行為來分配記憶體資源。然而,根據程式執行前所取得的分析來分配記憶體資源,可能會和程式在執行時的需求有所不同,這是由於使用者在執行程式時,輸入的資料與之前分析的資料不同所導致。因此,我們提出了一個不需要事前分析的草稿式記憶體管理策略,在程式執行時分析程式對記憶體的需求量,並根據每支程式的需求來分配記憶體資源,我們的方法和[8]相比,效能平均提高了29%。
Scratchpad Memory (SPM), a software-controlled on-chip memory, has been increasingly widely used in embedded systems because it has less access energy and higher area density when compared to an ordinary cache. In order to reduce energy consumption, the architecture of multicore system has been proposed which replaces cache by SPM. In this architecture, tasks can access data on both local and remote SPMs. Latency of accessing data on remote SPM is longer than local. Therefore, SPM allocation will affect execution time of tasks. In addition, SPM allocation method is an important issue in reducing energy consumption. Existing methods for allocating SPM space require offline program profiling, which could result in inefficient SPM utilization due to a different input to the program during runtime or unawareness of the SPM information in the runtime environment. This paper proposes SAMOS, a novel online method without offline profiling for allocating SPM space on multicore system. SAMOS performs SPM partition according to the dynamic access behavior of the running tasks. But, the partition may bring about a longer execution time by accessing data on remote SPM. Therefore, to minimize execution time of each task, SAMOS places more frequently used data in local and rarely frequently used data in remote. SAMOS is implemented based on the cooperation of hardware and software. The evaluation results show that SAMOS can reduce the energy delay product (EDP) by up to 57% (29% on average), compared to a contention aware SPM allocation policy (CASA). Moreover, the area overhead of the hardware support is insignificant (about 1.8%).
摘要 I
ABSTRACT II
致謝 III
CONTENT IV
LIST OF TABLES V
LIST OF FIGURES VI
Chapter 1 INTRODUCTION 1
Chapter 2 TARGET ARCHITECTURE and EXECUTION MODEL 6
Chapter 3 BACKGROUND and RELATED WORK 7
3.1 Static Allocation Method 7
3.2 Dynamic Allocation Method 8
Chapter 4 DESIGN of SAMOS 11
4.1 SPM Partitioner 12
4.2 Reducing remote SPM accesses 20
4.2.1 Free Remote SPM Space First (FRES) 21
4.2.2 Get Local SPM Space First (GLOS) 23
4.3 Page Access Bookkeeping Circuits (PABC) 26
4.4 Control Flow of SAMOS 27
Chapter 5 PERPFRMANCE EVALUATION 29
5.1 Simulation Environment 29
5.2 Effectiveness of SPM partitioner 32
5.3 Effectiveness of SPM allocator 34
5.4 EDP Comparison of Different SPM Replacement Policies 38
5.5 Period of Recording Cache Miss Frequency 39
5.6 Random Sampling Probability 39
5.7 Overhead 40
Chapter 6 CONCLUSION 42
REFERENCES 44
[1]T. Mück and A. Frohlich, A run-time memory management approach for scratch-pad-based embedded systems, in Proc. ETFA, 2010, pp. 1-4.
[2]M. Verma and P. Marwedel, Overlay techniques for scratchpad memories in low power embedded processors, IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 14, pp. 802-815, 2006.
[3]R. Banakar, S. Steinke, B. S. Lee, M. Balakrishnan, and P. Marwedel, Scratchpad memory: a design alternative for cache on-chip memory in embedded systems, in Proc. ACM CODES, 2002, pp. 73-78.
[4]M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, Compiler-directed scratch pad memory optimization for embedded multiprocessors, IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 12, pp. 281-287, 2004.
[5]L. Xue, F. Li, M. Kandemir, and I. Kolcu, Dynamic Partitioning of Processing and Memory Resources in Embedded MPSoC Architectures, in Proc. IEEE DATE, 2006, pp. 690-695.
[6]M. Kandemir, O. Ozturk, and M. Karakoy, Dynamic on-chip memory management for chip multiprocessors, in Proc. ACM CASES, 2004, pp. 14-23.
[7]H. Takase, H. Tomiyama, and H. Takada, Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems, in Proc. IEEE DATE, 2010, pp. 1124-1129.
[8]D. Chang, I. Lin, Y. Chien, C. Lin, A. Su, and C. Young, CASA: Contention-Aware Scratchpad Memory Allocation for Online Hybrid On-Chip Memory Management, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 33, pp. 1806-1817, 2014.
[9]W. Ji, N. Deng, F. Shi, Q. Zuo, and J. Li, Dynamic and adaptive SPM management for a multi-task environment, in Journal of Systems Architecture, vol. 57, pp. 181-192, 2011.
[10]B. Egger, J. Lee, and H. Shin, Scratchpad memory management in a multitasking environment, in Proc. ACM EMSOFT, 2008, pp. 265-274.
[11][Z. H. Chen and A. W. Su, A hardware/software framework for instruction and data scratchpad memory allocation, ACM Trans. TACO, vol. 7, p. 2, 2010.
[12]P. R. Panda, N. D. Dutt, and A. Nicolau, Efficient utilization of scratch-pad memory in embedded processor applications, in Proc. IEEE European conference on Design and Test, 1997, pp. 7-11.
[13]S. Steinke, L. Wehmeyer, B. S. Lee, and P. Marwedel, Assigning program and data objects to scratchpad for energy reduction, in Proc. IEEE DATE, 2002, pp. 409-415.
[14]O. Avissar, R. Barua, and D. Stewart, An optimal memory allocation scheme for scratch-pad-based embedded systems, ACM Trans. TECS, vol. 1, pp. 6-26, 2002.
[15]S. Meftali, F. Gharsalli, F. Rousseau, and A. A. Jerraya, An optimal memory allocation for application-specific multiprocessor system-on-chip, in Proc. ACM ISSS, 2001, pp. 19-24.
[16]V. Suhendra, C. Raghavan, and T. Mitra, Integrated scratchpad memory optimization and task scheduling for MPSoC architectures, in Proc. ACM CASES, 2006, pp. 401-410.
[17]S. Chattopadhyay and A. Roychoudhury, Static bus schedule aware scratchpad allocation in multiprocessors, in Proc. ACM LCTES, 2011, pp. 11-20.
[18]J. Hu, C. J. Xue, Q. Zhuge, W. C. Tseng, and E. M. Sha, Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory, in Proc. IEEE DATE, 2011, pp. 1-6.
[19]L. A. Bathen and N. Dutt, HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories, in Proc. ACM DAC, 2012, pp. 447-452.
[20]M. Kandemir, J. Ramanujam, and A. Choudhary, Exploiting shared scratch pad memory space in embedded multiprocessor systems, in Proc. ACM DAC, 2002, pp. 219-224.
[21]M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh, Dynamic management of scratch-pad memory space, in Proc. ACM DAC, 2001, pp. 690-695.
[22]L. A. D. Bathen, N. D. Dutt, D. Shin, and S. S. Lim, SPMVisor: dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories, in Proc. CODES+ISSS, 2011, pp. 79-88.
[23]A. Marongiu and L. Benini, Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy, in Proc. IEEE DATE, 2009, pp. 809-814.
[24]Y. Guo, Q. Zhuge, J. Hu, J. Yi, M. Qiu, and E. M. Sha, Data Placement and Duplication for Embedded Multicore Systems With Scratch Pad Memory, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 32, pp. 809-817, 2013.
[25]W. Hu, G. Wang, J. Chen, X. Lou, and T. Chen, Efficient scratchpad memory management based on multi-thread for MPSoC architecture, in Proc. Scalable Computing and Communications, 2009, pp. 429-434.
[26]J. Paul, W. Stechele, M. Kroehnert, and T. Asfour, Improving Efficiency of Embedded Multi-core Platforms with Scratchpad Memories, in Proc. VDE ARCS, 2014, pp. 1-8.
[27]A. Marongiu and L. Benini, An OpenMP compiler for efficient use of distributed scratchpad memory in MPSoCs, IEEE Trans. Computers, vol. 61, pp. 222-236, 2012.
[28]Y. Etsion and D. G. Feitelson, L1 cache filtering through random selection of memory references, in Proc. Parallel Architecture and Compilation Techniques, 2007, pp. 235-244.
[29]D. Burger and T. M. Austin, The SimpleScalar tool set, version 2.0, SIGARCH Comput. Archit. News 25, 3, 1997, pp.13-25..
[30]C.-C. Huang and V. Nagarajan, ATCache: reducing DRAM cache latency via a small SRAM tag cache, in Proc. ACM PACT, 2014, pp. 51-60.
[31]N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, CACTI 6.0: A tool to understand large caches, Univ. Utah and Hewlett Packard Lab.,Tech. Rep., 2009.
[32]M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, in Proc. Workload Characterization, 2001, pp. 3-14.
[33]T. Austin, E. Larson, and D. Ernst, SimpleScalar: An infrastructure for computer system modeling, IEEE Trans. Computer, vol. 35, pp. 59-67, 2002.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔