跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.82) 您好!臺灣時間:2025/01/17 07:04
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林業峻
研究生(外文):Yeh-Juin Lin
論文名稱:基於高性能多媒體系統處理器之減少記憶體存取架構應用
論文名稱(外文):Reducing Memory Access Overhead for High Performance Multimedia Processor
指導教授:陳添福陳添福引用關係
指導教授(外文):Tien-Fu Chen
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:75
中文關鍵詞:直接記憶體存取計算機結構多媒體快取記憶體資料搬動
外文關鍵詞:computer architecturedmacachemultimediadata movement
相關次數:
  • 被引用被引用:0
  • 點閱點閱:349
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在一個大型的嵌入式系統之中,記憶體伴演著最主要的成本、必v消耗與工作效率之瓶頸。因此,系統設計師往往付出大量心力在於設計記憶體架構。傳統上,一般在於可程式系統之中的記憶體架構大多利用階層式快取記憶體(cache hierarchy),處理器與記憶體之間越來越大的隔閡以至於針對特殊應用對於不同的記憶體設計之需要。在多媒體應用之中,對於單晶片外部與內部之記憶體資料搬移這些基本操作普遍存在於其應用之中。換言之,在於多媒體應用上之效能跟記憶體的存取與資料計算的隔閡有著緊密的關係。因為一連串的記憶體存取必須花費釵h資源且需要大量時間停頓處理器的執行使得資料搬移問題對於記憶體潛伏時間(memory latency)之惡化。釵h隱藏記憶體潛伏時間(memory latency hiding)技術被研究來使得可以同時進行多個記憶體存取,但都沒有對指定記憶體位置之搬移加以探討。在這篇論文之中,我們提出了兩個主要的想法:(1)資料之傳遞、(2)善用快取記憶體,以此加以提出相關之在於多媒體應用之記憶體存取花費之最佳化問題,並基於VisoMT UniCore架構上實作出雙埠快取記憶體、非阻擋式資料快取記憶體、以及可程式化背景存取記憶體機器,以證明其想法對於減少記憶體存取之花費的可行性。
Memory represents a major cost, power, and performance bottleneck for a large class of embedded systems. Thus, system designers pay great attention to the design and tuning of the memory architecture early in the design process. While traditional memory architecture for programmable systems was organized as a cache hierarchy, the widening processor/memory performance gap requires more aggressive use of memory configurations, customized for the specific target applications. Data movement between off-chip memory and on-chip memory is a basic primitive operation which is commonly executed by multimedia applications. The performance of these applications is known to be heavily dependent on the extent to which the memory accesses are overlapped by useful computation. The data movement problem exacerbates the memory latency problem since it requires a train of memory accesses, consuming most of the resources and stalls the CPU for a long period of time. Several memory latency hiding techniques that have been investigated apply well to one or a few simultaneous memory accesses, but do not address-based data movement scenario. In this thesis, we bring up the two major consideration: (1) data transferring and (2) cache utilization to optimize the memory access overhead in multimedia application problem and implement the dual-ported cache, non-blocking data cache, and a programmable background load/store machine on a VisoMT UniCore architecture to reduce the memory access times and hide the memory latency which optimize the memory overhead.
1. Introduction and Motivation
2. Related Work
3. Reducing Memory Access Overhead for Multimedia Application
4. Implementation
5. Experiment and Performance Evaluation
6. Conclusion
[1]H. Al-Sukhni, I. Bratt, and D. A. Connors, “Compiler-directed contentaware prefetching for dynamic data structures,” in Proc. 12th Int. Conf. Parallel Arch. Compilation Tech. (PACT), 2003, pp. 91–102.
[2]J.-L. Baer and T.-F. Chen, “An effective on-chip preloading scheme to reduce data access penalty,” in Proc. Supercomputing, 1991, pp. 176–186.
[3]B. T. Bennet and P. A. Franaczek, “Cache memory with prefetching of data by priority,” IBM Technical Disclosure Bulleting, vol. 18, no. 12, pp. 4231–4232, May 1976.
[4]D. Callahan, K.Kennedy, and A. Porterfield, “Software prefetching,” in Proc. 4th Int. Conf. Arch. Support Prog. Lang. Oper. Syst. (ASPLOS), 1991, pp. 40–52.
[5]T.-F. Chen and J.-L. Baer, “A Performance Study of Software and Hardware Data Prefetching Schemes,” Proc. 21st Int’l Symp. Computer Architecture (ISCA 94), ACM Press, New York, 1994, pp. 223-232.
[6]T. Chen, “An effective programmable prefetch engine for on-chip caches,” in Proc. 28th Int. Symp. Microarch., 1995, pp. 237–242.
[7]D. Chiou, S. Devadas, J. Jacos, P. Jain, V. Lee, E. Peserico, P. Portante, L. Rudolph, G. E. Suh, and D. Willenson, “Scheduler-Based Prefetching for Multilevel Memories,” Lab. Comput. Sci., MIT, Boston, MA, Group Memo 444, 2001.
[8]R. Cucchiara, A. Prati, and M. Piccardi, “Improving data prefetching efficacy in multimedia applications,” Multimedia Tools Appl., vol. 20, no. 2, pp. 159–178, Jun. 2003.
[9]M. Dasygenis, E. Brockmeyer, B. Durinck, F. Catthoor, D. Soudris, A. Thanailakis, “A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on Volume 14, Issue 3, March 2006 pp.279 – 291
[10]T. A. Enger, “Paged control store prefetch mechanism,” IBM Tech. Discl. Bull., vol. 7, no. 16, pp. 2140–2141, Dec. 1973.
[11]B. Flachs, S. Asano, S.H. Dhong, H.P. Hofstee, G. Gervais, Roy Kim, T. Le, Peichun Liu, J. Leenstra, J. Liberty, B. Michael, Hwa-Joon Oh, S.M. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D.A. Brokenshire, M. Peyravian, Vandung To, E. Iwata, ” The microarchitecture of the synergistic processor for a cell processor,” Solid-State Circuits, IEEE Journal of Volume 41, Issue 1, Jan. 2006 pp.63 – 70
[12]J. Fritts, “Multi-level memory prefetching for media and stream processors,” in Proc. Int. Conf. Multimedia Expo (ICME), 2002, pp. 101–104.
[13]H. Glenn, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The microarchitecture of the Pentium 4 processor,” Intel Technol. J., vol. Q1, pp. 1–10, 2001.
[14]E. H. Gornish and A. V. Veidenbaum, “An integrated hardware/software scheme for shared-memory multiprocessors,” in Proc. Int. Conf. Parallel Process., 1994, pp. 281–284.
[15]H.P. Hofstee, “Power efficient processor architecture and the cell processor,” High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on 12-16 Feb. 2005 pp.258 – 262
[16]N. P. Jouppi, “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,” in Proc. Int. Symp. Comput. Arch., 1990, pp. 363–373.
[17]M. Kang, W. Sung, “Memory access overhead reduction for a digital color copier implementation using a VLIW digital signal processor,” Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on 23-26 May 2005 pp.1465 - 1468 Vol. 2
[18]D. Kim, R. Managuli, Y. Kim, “Data cache and direct memory access in programming mediaprocessors ,” Micro, IEEE Volume 21, Issue 4, July-Aug. 2001 pp.33 - 42
[19]D. Kim, K. Chung, C.H. Yu, C.Ho. Kim, I. Lee, J. Bae, Y.J. Kim, Y.J. Chung, S. Kim, Y.H. Park, N. Seong, J.A. Lee, J. Park, S. Oh, S.W. Jeong, L.S. Kim, “An SoC with 1.3Gtexels/s 3D Graphics Full Pipeline Engine for Consumer Applications,” Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International 6-10 Feb. 2005
[20]R. L. Lee, P. C. Yew, and D. H. Lawrie, “Data prefetching in shared memory multiprocessors,” in Proc. Int. Conf. Parallel Process., 1987, pp. 28–31.
[21]C.-K. Luk, “Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors,” in Proc. 28th Int. Conf. Comput. Arch., 2001, pp. 40–51.
[22]R. Lysecky and F. Vahid, “Prefetching for improved bus wrapper performance in cores,” ACM Trans. Des. Automat. Electron. Syst., vol. 7, no. 1, pp. 58–90, Jan. 2002.
[23]T. Mowry and A. Gupta, “Tolerating latency through software-controlled data prefetching,” J. Parallel Distrib. Comput., vol. 12, no. 2, pp. 87–106, Jun. 1991.
[24]T.Mowry, M. Lam, and A. Gupta, “Design and evaluation of a compiler algorithm for prefetching,” in Proc. ACM 5th Int. Conf. Arch. Support Program. Lang. Oper. Syst. , 1992, pp. 62–73.
[25]J. Nieplocha, V. Tipparaju, M. Krisnan, G. Santhanaraman, D.K. Panda, “Optimizing mechanisms for latency tolerance in remote memory access communication on clusters,” Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on 2003 pp.138 – 147
[26]M. O'Nils, A. Jantsch, “Synthesis of DMA controllers from architecture independent descriptions of HW/SW communication protocols,” VLSI Design, 1999. Proceedings. Twelfth International Conference On 7-10 Jan. 1999 pp.138 – 145
[27]D.C. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P.M. Harvey, H.P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Pham, J. Pille, S. Posluszny, M. Riley, D.L. Stasiak, M. Suzuoki, O. Takahashi, J. Warnock, S. Weitzel, D. Wendel, K. Yazawa, “Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor,” Solid-State Circuits, IEEE Journal of Volume 41, Issue 1, Jan. 2006 pp.179 – 196
[28]A. K. Porterfield, “Software methods for improvement of cache performance on supercomputer applications,” Ph.D. dissertation, Rice University, Houston, TX, 1989, Tech. Rep. CRPC-TR89009.
[29]PREETI RANJAN PANDA Synopsys, Inc. and Nikil D. Dutt and Alexandru Nicolau University of California at Irvine, “On-Chip vs Off-Chip Memory- The Data Partitioning Problem in Embedded Processor-Based Systems,” ACM Transactions on Design Automation of Electronic Systems, Vol. 5, No. 3, July 2000, pp.682–704.
[30]V. Santhanam, E. H. Gornish, and W. C. Hsu, “Data prefetching on the HP PA-8000,” in Proc. 24th Int. Symp. Comput. Arch. (ISCA), 1997, pp. 264–273.
[31]T. Shiota, K. Kawasaki, Y. Kawabe, W. Shibamoto, A. Sato, T. Hashimoto, F. Hayakawa, S. Tago, H. Okano, Y. Nakamura, H. Miyake, A. Suga, H. Takahashi, “A 51.2GOPS 1.0GB/s-DMA Single-Chip Multi-Processor Integrating Quadruple 8-Way VLIW Processors,” Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International 6-10 Feb. 2005 pp.194 - 593 Vol. 1
[32]A. J. Smith, “Sequential program prefetching in memory hierarchies,” IEEE Computer, vol. 11, no. 12, pp. 7–21, Dec. 1978.
[33]Z. Wang, D. Burger, K. S. McKinley, S. K. Reinhardt, and C. C. Weems, “Guided region prefetching: A cooperative hardware/software approach,” in Proc. 30th Ann. Int. Symp. Comput. Arch., 2003, pp. 388–400.
[34]C. Xia and J. Torrellas, “Improving the data cache performance of multiprocessor operating systems,” in Proc. 2nd IEEE Symp. High-Performance Comput. Arch. (HPCA), 1996, pp. 85–94.
[35]K. Yeager, “The MIPS R10000 superscalar microprocessor,” IEEE Micro, vol. 16, no. 2, pp. 28–40, Apr. 1996.
[36]Z. Zhang, Z. Zhu, X. Zhang, “Cached DRAM for ILP processor memory access latency reduction,” Micro, IEEE Volume 21, Issue 4, July-Aug. 2001 pp.22 – 32
[37]X. Zhuang and H.-H. S. Lee, “A hardware-based cache pollution filtering mechanism for aggressive prefetches,” in Proc. IEEE Int. Conf. Parallel Process. (ICPP), 2003, pp. 286–293.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top