臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.73) 您好！臺灣時間：2026/07/22 11:51

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

呂進德

研究生(外文):

Chin-Te Lu

論文名稱:

長管線延遲資料路徑之高面積效率設計與實現

論文名稱(外文):

Area-Efficient Design and Implementation of Deep-Pipeline Latency Datapath

指導教授:

劉志尉

指導教授(外文):

Chih-Wei Liu

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電子工程系所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2008

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

長管線延遲、資料路徑、高面積效率

外文關鍵詞:

deep-pipeline latency、datapath、area-efficient

相關次數:

被引用:0
點閱:383
評分:
下載:0
書目收藏:0

處理器的資料路徑(datapath)通常是影響其效能的最重要部分。隨著不同應用需求，資料路徑的配置與設計也會不同，一般說來，針對高效能處理器，例如Intel Pentium處理器、IBM Cell 處理器等，設計者會藉由各種VLSI技術，盡可能的提高資料路徑的操作頻率；但另一方面，對於輕量化(lightweight)應用、如嵌入式系統(embedded system)，則會以追求低功率、低晶片面積等方向做最佳化資料路徑設計。同一套指令集架構(instruction set architecture)對於不同的應用而言會有不同的資料路徑設計，針對此，本論文提出一套能針對不同效能需求，而能自動合成一具高面積效率的資料路徑設計流程。此具高面積效率資料路徑產生器，其中包含兩個動作：空間和時間維度做最佳化設計。此具高面積效率資料路徑產生器可延用現有的高效能處理器的指令集、如IBM Cell，和其相關發展軟體與應用程式，並根據應用所需的效能，有系統的對處理器資料路徑做最佳化。空間維度上的最有效率的應用意指資料共享路徑，包含建立函數模型(function modeling)和週期準確模型(cycle-accurate modeling)設計。另一方面，我們也會針對時間維度上做最佳化，並分析指令的延遲(latency)時間，系統化地建立數學方程式以獲得最小面積的微架構(micro-architecture)。我們以Cell SPU(Synergistic Processor Unit)資料路徑設計為例，利用所提出的設計流程分析指令集架構，尋找出最高面積效率的微架構。實驗顯示，針對100MHz到800MHz的嵌入式微處理器的資料路徑設計，我們所提出的設計流程比自動化工具改善約20%的面積。在UMC 90nm的製程下，我們利用前述的設計流程實作SPU數位訊號處理器，晶片面積為2.5mm x 2.5mm，而其操作頻率為400MHz。

Datapath is primarily the most critical element that affects performance. The allocations and design of datapath depends various application requirements. General speaking, for high-performance processors like Intel’s Pentium Processors, IBM’s Cell Processors and so on, the designers extremely rise up operating frequency by board VLSI techniques. On the contrary, such as lightweight applications in the embedded system, the goal of datapath design is to seek low-power, small chip area and so on. The instruction set architecture (ISA) has different ways of implementation for different application requirements. Therefore, this thesis proposes the design flow to automatically generate the area-efficient datapath for various application requirements. The area-efficient datapath generator includes the two-phased including spatial-optimized and temporal-optimized for datapath optimization. It can systematically develop and optimize datapth of the processors while leveraging the instruction set architecture (ISA) of high performance processor like IBM’s Cell and the software toolchain and application programs. Spatial-optimized means that efficient utilization in spatial domain including function modeling and cycle-accurate design. In other phase, temporal-optimization explores the instruction latency to systematically build up mathematical formulation to get the optimal micro-architecture. We take the Cell synergistic processor unit (SPU) as our datapath design example to analyze the optimization space of SPU ISA implementation, and find the area-efficient micro-architecture by using our proposed design flow. In the experiment, the micro-architecture by using our proposed design flow improves about 15-20% of area compared to using CAD tools for datapath design of embedded processors targeted 100MHz to 800MHz. Finally, we use the previous design flow to implement the SPU DSP in the UMC 90nm 1P9M CMOS process. The silicon area is 2.5mm x 2.5mm and the clock rate is 400MHz.

1 INTRODUCTION 1
1.1 Motivation 2
1.2 Problem Description and Distribution 2
1.3 Thesis Organization 4
2 BACKGROUND 5
2.1 Cell Broadband Engine Architecture 6
2.2 SPU Instruction Set Architecture 10
2.3 SPU Micro-Architecture 18
3 DESIGN & OPTIMIZATION FLOW OF DEEP-PIPELINE LATENCY DATAPATH 23
3.1 Spatial Optimization 24
3.1.1 Function Modeling 24
3.1.2 Cycle-Accurate Modeling 29
3.2 Temporal Optimization 34
3.3 Experimental Results 41
4 SILICON IMPLEMENTATION 49
4.1 Implementation Design Flow 50
4.2 Implementation Result 52
5 CONCLUSION & FUTURE WORKS 55
REFERENCES 57

[1] Y. H. Hu, Programmable Digital Signal Processors – Architecture, Programming, and Applications, Marcel Dekker Inc., 2002
[2] R.B. Lee, “Multimedia extensions for general-purpose processors,” in Proc. IEEE Workshop Signal Processing Systems, pp. 9-23, Nov.1997.
[3] K. Diefendorff, P.K. Dubey, R. Hochsprung, and H. Scales, “AltiVec extension to PowerPC accelerates media processing,” IEEE Micro, vol. 20, no. 2, pp. 85-95, Mar./Apr. 2000.
[4] J.A Kahle et al., “Introduction to the Cell multiprocessor,” IBM J. Research and Development, vol. 49, no. 4/5, July 2005, pp.589-604
[5] The Cell architecture. [Online]. Available: http://domino.watson.ibm.com/comm/research.nsf/pages/r.arch.innovation.html
[6] Cell Broadband Engine Programming Handbook version 1.1, IBM, 2007
[7] B. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Gervais, R. Kim, T. Le, et. al., "The microarchitecture of the synergistic processor for a Cell processor," IEEE J. Solid State Circuits 41, No. 1, 63-70 (2006).
[8] Synergistic Processor Unit Instruction Set Architecture, Version 1.2, IBM Corporation, Sony Computer Entertainment Corporation, and Toshiba Corporation. [Online]. http://www.ibm.com/chips/techlib/techlib.nsf/techdecs/ 76CA6C7304210F3987257060006F2C44/$file/ SPU_ISA_v1.2_27Jan2007_pub.pdf.
[9] J. Leenstra et al., “The vector fixed point unit of the streaming processor of a CELL processor,” presented at the Symp. VLSI Circuits, Kyoto, Japan, 2005.
[10] H. Oh et al., “A fully-pipelined single-pipelined single-precision floating point unit in the streaming processing unit of a CELL processor,” presented at the Symp. VLSI Circuits, Kyoto, Japan, 2005.
[11] S. Krithivasan and M.J. Schutle, “Multiplier Architecture for Media Processing,” in Proc. 37th Asilomar Conf. Signals, Systems, and Computers, pp. 2193-2197, Nov. 2003
[12] Suzuki, K. et al.,”A 2000-MOPS embedded RISC processor with a Rambus DRAM controller,” IEEE J. Solid-State Circuit, vol. 34, pp. 1010-1021, 1999
[13] A. Terechko, M. Garg, and H. Corporaal, “Evaluation of speed and area of clustered VLIW Processors,” in Proc. VLSID, pp.557-563, 2005
[14] P.C. Hsiao, T. J. Lin, C. W. Liu, and C. W. Jen, “Efficient datapath design for clustered &pipelined digital signal processors,” in Proc. VLSI design/CAD, Aug. 2005
[15] C. Leiserson, F. Rose, and J. Saxe, “Optimizing synchronous circuitry by retiming,” in Third Caltech Conference on VLSI, pp. 87-116, 1983
[16] C. Leiserson, F. Rose, and J. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol.6, pp. 5-35, 1911
[17] K. K. Parhi, VLSI Digital Signal Processing Systems – Design and Implementation, John Wiley & Sons, 1999
[18] A. Hoffmann, H. Meyr, and R. Leupers, Architecture Exploration for Embedded Processors with LISA, Kluwer Academic Publishers, 2002

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	嵌入式系統晶片之匯流排與記憶體設計探索
2.	適用於多核心PlayStation3平台之基於多層級管線模型的多媒體平行處理技術
3.	適用於數位助聽器之低複雜度多通道動態範圍壓縮技術
4.	適用於數位助聽器之超低功耗ANSIS1.11濾波器組
5.	運用重新取樣技術實現具可程式化非均勻濾波器組之研究
6.	適用於最小和重組LDPC解碼演算法之補償技術
7.	60 GHz頻帶10 Gbps單一載波基頻接收機的可抵抗高相雜訊之同步與相雜訊補償設計
8.	適用於雙耳助聽器之低延遲降噪演算法與硬體實作
9.	基於快速傅立葉轉換多率訊號處理技術實現之低複雜度Quasi-ANSI濾波器組
10.	適用於卷積神經網路應用之高精準度高效益靜態浮點數運算外積陣列處理器
11.	基於邊緣散焦程度的單相機影像快速深度估測方法
12.	助聽器系統的時頻分解、降噪、與適應性回授消除之分析與設計
13.	基於低延遲Quasi-ANSI濾波器組之雙耳助聽器低複雜度降噪演算法的設計與實作
14.	適用於數位助聽器之快速傅立葉轉換的可調式步階係數與探測訊號之回授消除技術以及系統震盪偵測設計
15.	適用於極低電壓之超大型積體電路設計的一種新型現場錯誤偵測及錯誤修正技術

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室