跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.215) 您好!臺灣時間:2026/05/24 14:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:呂進德
研究生(外文):Chin-Te Lu
論文名稱:長管線延遲資料路徑之高面積效率設計與實現
論文名稱(外文):Area-Efficient Design and Implementation of Deep-Pipeline Latency Datapath
指導教授:劉志尉
指導教授(外文):Chih-Wei Liu
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電子工程系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:97
語文別:英文
論文頁數:59
中文關鍵詞:長管線延遲資料路徑高面積效率
外文關鍵詞:deep-pipeline latencydatapatharea-efficient
相關次數:
  • 被引用被引用:0
  • 點閱點閱:373
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
處理器的資料路徑(datapath)通常是影響其效能的最重要部分。隨著不同應用需求,資料路徑的配置與設計也會不同,一般說來,針對高效能處理器,例如Intel Pentium處理器、IBM Cell 處理器等,設計者會藉由各種VLSI技術,盡可能的提高資料路徑的操作頻率;但另一方面,對於輕量化(lightweight)應用、如嵌入式系統(embedded system),則會以追求低功率、低晶片面積等方向做最佳化資料路徑設計。同一套指令集架構(instruction set architecture)對於不同的應用而言會有不同的資料路徑設計,針對此,本論文提出一套能針對不同效能需求,而能自動合成一具高面積效率的資料路徑設計流程。此具高面積效率資料路徑產生器,其中包含兩個動作:空間和時間維度做最佳化設計。此具高面積效率資料路徑產生器可延用現有的高效能處理器的指令集、如IBM Cell,和其相關發展軟體與應用程式,並根據應用所需的效能,有系統的對處理器資料路徑做最佳化。空間維度上的最有效率的應用意指資料共享路徑,包含建立函數模型(function modeling)和週期準確模型(cycle-accurate modeling)設計。另一方面,我們也會針對時間維度上做最佳化,並分析指令的延遲(latency)時間,系統化地建立數學方程式以獲得最小面積的微架構(micro-architecture)。我們以Cell SPU(Synergistic Processor Unit)資料路徑設計為例,利用所提出的設計流程分析指令集架構,尋找出最高面積效率的微架構。實驗顯示,針對100MHz到800MHz的嵌入式微處理器的資料路徑設計,我們所提出的設計流程比自動化工具改善約20%的面積。在UMC 90nm的製程下,我們利用前述的設計流程實作SPU數位訊號處理器,晶片面積為2.5mm x 2.5mm,而其操作頻率為400MHz。
Datapath is primarily the most critical element that affects performance. The allocations and design of datapath depends various application requirements. General speaking, for high-performance processors like Intel’s Pentium Processors, IBM’s Cell Processors and so on, the designers extremely rise up operating frequency by board VLSI techniques. On the contrary, such as lightweight applications in the embedded system, the goal of datapath design is to seek low-power, small chip area and so on. The instruction set architecture (ISA) has different ways of implementation for different application requirements. Therefore, this thesis proposes the design flow to automatically generate the area-efficient datapath for various application requirements. The area-efficient datapath generator includes the two-phased including spatial-optimized and temporal-optimized for datapath optimization. It can systematically develop and optimize datapth of the processors while leveraging the instruction set architecture (ISA) of high performance processor like IBM’s Cell and the software toolchain and application programs. Spatial-optimized means that efficient utilization in spatial domain including function modeling and cycle-accurate design. In other phase, temporal-optimization explores the instruction latency to systematically build up mathematical formulation to get the optimal micro-architecture. We take the Cell synergistic processor unit (SPU) as our datapath design example to analyze the optimization space of SPU ISA implementation, and find the area-efficient micro-architecture by using our proposed design flow. In the experiment, the micro-architecture by using our proposed design flow improves about 15-20% of area compared to using CAD tools for datapath design of embedded processors targeted 100MHz to 800MHz. Finally, we use the previous design flow to implement the SPU DSP in the UMC 90nm 1P9M CMOS process. The silicon area is 2.5mm x 2.5mm and the clock rate is 400MHz.
1 INTRODUCTION 1
1.1 Motivation 2
1.2 Problem Description and Distribution 2
1.3 Thesis Organization 4
2 BACKGROUND 5
2.1 Cell Broadband Engine Architecture 6
2.2 SPU Instruction Set Architecture 10
2.3 SPU Micro-Architecture 18
3 DESIGN & OPTIMIZATION FLOW OF DEEP-PIPELINE LATENCY DATAPATH 23
3.1 Spatial Optimization 24
3.1.1 Function Modeling 24
3.1.2 Cycle-Accurate Modeling 29
3.2 Temporal Optimization 34
3.3 Experimental Results 41
4 SILICON IMPLEMENTATION 49
4.1 Implementation Design Flow 50
4.2 Implementation Result 52
5 CONCLUSION & FUTURE WORKS 55
REFERENCES 57
[1] Y. H. Hu, Programmable Digital Signal Processors – Architecture, Programming, and Applications, Marcel Dekker Inc., 2002
[2] R.B. Lee, “Multimedia extensions for general-purpose processors,” in Proc. IEEE Workshop Signal Processing Systems, pp. 9-23, Nov.1997.
[3] K. Diefendorff, P.K. Dubey, R. Hochsprung, and H. Scales, “AltiVec extension to PowerPC accelerates media processing,” IEEE Micro, vol. 20, no. 2, pp. 85-95, Mar./Apr. 2000.
[4] J.A Kahle et al., “Introduction to the Cell multiprocessor,” IBM J. Research and Development, vol. 49, no. 4/5, July 2005, pp.589-604
[5] The Cell architecture. [Online]. Available: http://domino.watson.ibm.com/comm/research.nsf/pages/r.arch.innovation.html
[6] Cell Broadband Engine Programming Handbook version 1.1, IBM, 2007
[7] B. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Gervais, R. Kim, T. Le, et. al., "The microarchitecture of the synergistic processor for a Cell processor," IEEE J. Solid State Circuits 41, No. 1, 63-70 (2006).
[8] Synergistic Processor Unit Instruction Set Architecture, Version 1.2, IBM Corporation, Sony Computer Entertainment Corporation, and Toshiba Corporation. [Online]. http://www.ibm.com/chips/techlib/techlib.nsf/techdecs/ 76CA6C7304210F3987257060006F2C44/$file/ SPU_ISA_v1.2_27Jan2007_pub.pdf.
[9] J. Leenstra et al., “The vector fixed point unit of the streaming processor of a CELL processor,” presented at the Symp. VLSI Circuits, Kyoto, Japan, 2005.
[10] H. Oh et al., “A fully-pipelined single-pipelined single-precision floating point unit in the streaming processing unit of a CELL processor,” presented at the Symp. VLSI Circuits, Kyoto, Japan, 2005.
[11] S. Krithivasan and M.J. Schutle, “Multiplier Architecture for Media Processing,” in Proc. 37th Asilomar Conf. Signals, Systems, and Computers, pp. 2193-2197, Nov. 2003
[12] Suzuki, K. et al.,”A 2000-MOPS embedded RISC processor with a Rambus DRAM controller,” IEEE J. Solid-State Circuit, vol. 34, pp. 1010-1021, 1999
[13] A. Terechko, M. Garg, and H. Corporaal, “Evaluation of speed and area of clustered VLIW Processors,” in Proc. VLSID, pp.557-563, 2005
[14] P.C. Hsiao, T. J. Lin, C. W. Liu, and C. W. Jen, “Efficient datapath design for clustered &pipelined digital signal processors,” in Proc. VLSI design/CAD, Aug. 2005
[15] C. Leiserson, F. Rose, and J. Saxe, “Optimizing synchronous circuitry by retiming,” in Third Caltech Conference on VLSI, pp. 87-116, 1983
[16] C. Leiserson, F. Rose, and J. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol.6, pp. 5-35, 1911
[17] K. K. Parhi, VLSI Digital Signal Processing Systems – Design and Implementation, John Wiley & Sons, 1999
[18] A. Hoffmann, H. Meyr, and R. Leupers, Architecture Exploration for Embedded Processors with LISA, Kluwer Academic Publishers, 2002
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top