(18.210.12.229) 您好!臺灣時間:2021/03/05 12:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:楊宗憲
研究生(外文):Tzung-Shian Yang
論文名稱:協同處理資料路徑之設計與產生
論文名稱(外文):Design and Generation of Coprocessing Datapath
指導教授:任建葳任建葳引用關係
指導教授(外文):Chein-Wei Jen
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電子工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:123
中文關鍵詞:協同處理資料路徑
外文關鍵詞:coprocessingdatapath
相關次數:
  • 被引用被引用:0
  • 點閱點閱:154
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:18
  • 收藏至我的研究室書目清單書目收藏:0
現今多媒體應用越來越多樣化,可程式化的系統解決方案已成為主要的潮流,微處理器的架構及相關製程亦持續不斷提昇支援其所需的效能。但在功率消耗及價格訴求的考量下,強大的單一微處理器系統已不能滿足消費性產品的需求,特別是以電池供電的攜帶式設備。目前業界的解決方案是採取異質性平台(heterogeneous platform),針對特定的應用群在微控制器旁添加額外的專屬輔助運算硬體,降低功率消耗及提升運算效率。
為加速產品的上市,採取事先設計及驗證的IP模組是一個有效的方案。但這些在先前計畫中被設計使用或是由third-party取得授權的硬體加速器模組通常都不太符合應用的需求;若開始著手設計新硬體,為了讓這個模組能重複被使用,也必須over-design。另外,要將此硬體模組加入系統中的HW/SW介面設計也是非常地煩瑣。鑑於以上考量,在此我們提出了一個加速運算之資料路徑的自動產生器(DSP Datapath Generator)。
使用者依據各自的應用指定不同的需求,此Datapath Generator則自動產生合適的資料路徑加速運算,使每一分投資的成本都能轉換成其最高的運算效率(computational efficiency)。大幅降低系統成本。另外,硬體部分我們採用業界標準AMBA AHB設計,可輕易地將我們自動產生的資料路徑整合進標準的計算平台中(如許多以ARM或PowerPC為核心之架構)。同時我們也提供自動產生的軟體驅動程式,有效解決一般系統設計師所必須面臨之煩瑣易錯的軟硬體介面問題。故我們提出的系統設計驗證流程之複雜度及所需的設計時間與一般純軟體系統的流程相去不遠,不會增加太多的負擔。

Embedded systems are trending toward programmable solutions to meet the time-to-market (TTM) requirements under unstable and changing standards. Technology improvement and architecture innovation drive the microprocessor performance continuously to sustain the complex multimedia applications. Novel products that support new standards require extremely high performance and cannot be power- and cost-efficient, especially for battery-powered and portable devices.
For years, the industry uses the heterogeneous approach to solve this problem, which attaches specific hardware accelerators to the host embedded processor. Pre-designed and verified IP modules can significantly reduce the development time. But the hardware IP seldom meets the application requirements. Even if the developer designs his/her accelerator from scratch, some over-design is required to make the hardware re-usable. Besides, the hardware/software interface is tedious and error-prone. These motivate a coprocessing datapath generator, which synthesizes a customized hardware accelerator with the interface modules.
We propose a DSP datapath generator in this thesis, which accepts the user-specified constraints to generate synthesizable Verilog code for an optimal hardware accelerator. For the specified speed requirement, the generator minimizes the number of concurrent functional units to reduce the cost. MIN (Multi-stage Interconnection Network) is adopted in this thesis as the interconnection template for large-scale accelerators to reduce the routing complexity and the silicon area in conventional MUX (multiplexor)-based architectures. The generated DSP datapath is wrapped in AMBA AHB with the auto-generated software driver, which facilitates the integration into standard platforms (e.g. several commercial ARM or PowerPC-powered hardware platforms). The generation of DCT, FFT, and DWT accelerators is available in the “Example” chapter with a complete accelerated JPEG encoder system.

Chapter 1 Introduction 1
1.1 SoC 1
1.2 Heterogeneous Computing Platform 3
1.3 CASCADE — Configurable and Scalable DSP Environment 5
1.4 Automatic Generation of Coprocessing Datapaths 8
1.5 Thesis Organization 10
Chapter 2 Related Works 13
2.1 HW/SW Codesign framework 13
2.1.1 COSYMA 16
2.1.2 Vulcan 17
2.2 Configurable Processor 18
2.2.1 Tensilica 18
2.2.2 ARM Coprocessor 21
2.2.3 ARC Core 22
Chapter 3 Architectures of the Coprocessing Datapath 25
3.1 Conventional Dataflow Architectures 25
3.2 SIU-based Architectures 31
3.2.1 Interconnection 32
3.2.2 Storage 37
3.2.3 Functional Units 40
3.2.4 AHB Wrapper 44
Chapter 4 Automatic Generation of the Coprocessing Datapath 57
4.1 High Level Synthesis and Our Proposed Architecture 57
4.2 Synthesis for MUX-based Architecture 62
4.2.1 Operation Scheduling (dfg2sch) 62
4.2.2 Stream Interface Converting (sch2siu) 68
4.2.3 Storage Content Allocation (siu2mp) 69
4.2.4 RTL Verilog Generation (mp2v) 74
4.3 MIN-based Architecture 74
4.3.1 Looping Algorithm 75
4.3.2 Modification in Cost Function 81
4.3.3 Output Conflict Problem 81
Chapter 5 Example 83
5.1 Baseline JPEG (Fast DCT kernel) 83
5.1.1 JPEG Encoder 84
5.1.2 Fast DCT 85
5.1.3 Results 88
5.1.4 Layout 91
5.2 FFT 92
5.2.1 Results 92
5.2.2 Layout 95
5.3 DWT 96
5.3.1 Results 97
5.3.2 Layout 100
5.4 Summary 101
Chapter 6 Conclusion and Future Work 105
6.1 Conclusion 105
6.2 Future work 106
Appendix A. Usage of Tool 109
Appendix B. File Formats 111
B.1 Data Flow Graph Format (.dfg) 112
B.2 Scheduled Format (.sch) 113
B.3 Stream Format (.siu) 115
B.4 Memory Port Format (.mp) 116
B.5 RTL Verilog file description 119
Appendix C. Declaration of Input/Output in <project_name>_fu.v and <project_name>_fu_syn.v 121
Appendix D. File Relationship 123

[1] D.C. Opferman and N.T. Tsao-Wu, “On a Class of Rearrangeable Switching Networks,” the Bell system Technical Journal, May-June 1971
[2] Steinar Andresen, “The Looping Algorithm Extended to Base 2t Rearrangeable Switching Networks,” IEEE Transactions on Communications, Vol. COM-25, NO. 10, Oct 1977
[3] Tse-yun Feng and Seung-Woo Seo, “A New Routing Algorithm for a Class of Rearrangeable Networks,” IEEE Transactions on Computers, Vol. 43, NO. 11, Nov 1994
[4] Pierre G. Paulin and John P. Knight, “Force—Directed Scheduling for the Behavioral Synthesis of ASIC’s,” IEEE Transactions on Computer-Aided Design, Vol. 8, NO. 6, June 1989
[5] Alberto Sangiovanni-Vincentelli and Grant Martin, “Platform-Based Design and Software Design Methodology for Embedded Systems,” IEEE Design & Test of Computers, P.23-P.33, Nov-Dec 2001
[6] Ki-Il Kum and Wonyong Sung, “Combined Word-Length Optimization and High-Level Synthesis of Digital Signal Processing Systems,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, NO. 8, Aug 2001
[7] Yen-Lin Lee and Lan-Rong Dung, “The Configurable Scheduler for IP-Based SOC Synthesis,” VLSI Design/CAD Symposium, Hsinchu, August 2001
[8] Yin-Tsung Hwang, Ming-Chang Tsai and Nan-Ron Liu, “An Efficient MP3 Audio Decoder Implementabion Based on HW/SW CoDesign,” VLSI Design/CAD Symposium, Hsinchu, August 2001
[9] I.-J. Huang, W.-K. Huang and C.-F. Kao, “A Parameterized MMX IP Module for RISC Microprocessors,” VLSI Design/CAD Symposium, Hsinchu, August 2001
[10] Jagesh Sanghavi and Albert Wang, “Estimation of Speed, Area, and Power of Parameterizable, Soft IP,” DAC 2001, June 18-22, 2001
[11] Tay-Jyi Lin and Chein-Wei Jen, "CASCADE - Configurable and Scalable DSP Environment," IEEE International Symposium on Circuits and Systems (ISCAS), Arizona, May
[12] Chein-Wei Jen and Tay-Jyi Lin, "Heterogeneous Arcthiecture with Configurable Coprocessing Datapaths," International Workshop on Nanoelectronic Circuits and Giga-Microsystems (IWNCGM), Miao-Li, October 2001
[13] Tay-Jyi Lin, Tzung-Shian Yang and Chein-Wei Jen, "Coprocessing Datapath Generation in Configurable DSP Platforms," VLSI Design/CAD Symposium, Hsinchu, August 2001
[14] Tay-Jyi Lin and Chein-Wei Jen, "Data Stream Generation for Concurrent Computation in VLSI Signal Processors," International Conference on Signal Processing (ICSP), Beijing, August 2000
[15] Michael Keating and Pierre Bricaud, Reuse Methodology Manual for System-on-a-Chip Designs, 2nd
[16] Daniel Gajski, Nikil Dutt, Allen Wu and Steve Lin, High-Level Synthesis Introduction to Chip and System Design, Kluwer Academic Publishers
[17] R. Sethi, “Complete register allocation problems,” SIAM J. Computing, vol. 4, no. 3, pp. 226—248, Sept. 1975.
[18] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, Addison Wesley
[19] AMBATM Specification (Rev 2.0)
[20] Keshab K. Parhi and Joo-Sang Lee, “Register Allocaltion for Design of Data Format Converts,” ICASSP, vol. 2, pp 1133-1136, 1991
[21] M. Majurdar, K. K. Parhi, “Design of Data Format Converters Using Two-Dimensional Register Allocation,” IEEE Trans. CAS II, Apr 1998
[22] K.Srivatsan, C. Chakrabarti, and L. Lucke, “Low Power Data Format Converter Design Using Semi-Static Register Allocation,” ICCD, 1995
[23] S. F. Li, M. Wan, and Jan Rabaey, “Configuration Code Generation and Optimization for Heterogeneous Reconfigurable DSPs”, IEEE SiPS 99
[24] K. K. Parhi, “Systematic synthesis of DSP data format converters using lifetime analysis and forward—backward register allocation,” IEEE Trans. Circuits Syst. II, vol. 39, pp. 423—440, July 1992.
[25] S. Mallat, A Wavelet Tour of Signal Processing, 2nd Edition, Academic Press, 1999
[26] S. Mallat, “Multifrequency Channel Decompositions of Images and Wavelet Models,” IEEE Transactions on Acoustics Speech and Signal Processing, Dec 1989
[27] Tay-Jyi Lin and Chein-Wei Jen, "An Efficient 2-D DWT Architecture via Resource Cycling," IEEE International Symposium on Circuits and Systems (ISCAS), Sydney, May 2001

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔