跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/04 04:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林宏光
論文名稱:高效能且可組態之子字組平行化乘加器設計
論文名稱(外文):High-Performance Reconfigurable Sub-Word Parallel Multiplier-Accumulator Design
指導教授:黃俊達黃俊達引用關係
指導教授(外文):Juinn-Dar Huang
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電子工程系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:88
中文關鍵詞:乘法器乘加器可組態平行化資料路徑多媒體算術單元高效態
外文關鍵詞:multipliermultiply-accumulateMACSIMDparallelBoothWallacehigh performance
相關次數:
  • 被引用被引用:0
  • 點閱點閱:192
  • 評分評分:
  • 下載下載:24
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出一個高效能乘加器的設計方法。此乘加器除支援子字組平行化功能之外,還能執行混模運算並具較有彈性的子字組設定。我們提出了一個新的子字平行部份乘積陣列及一個創新的子字平行部份乘積簡化樹以實現子字組平行化。為了利用原本的乘加器硬體,子字組平行化乘加器僅需增加微量的延遲及些許的面積。我們提出的乘加器可動態重組、可合成、可重覆使用且可驗證。我們實做並比較我們的設計及先前的設計。實驗數據顯示,無論在設計延遲、所佔面積、所耗功率,我們的方法在理論上及實務上都改善並且勝過舊方法。
This thesis presents the design methodology of a high-performance reconfigurable multiplier-accumulator (MAC) capable of supporting sub-word parallelism (SWP) and additional features such as mixed-mode operation and flexible sub-word combination and mode assignment scheme. In order to perform SWP on the proposed scalar MAC, a new SWP partial product array and a novel speed-optimized SWP partial product reduction tree are proposed. With slight delay and some area overhead, the SWP MAC utilizes essentially the same hardware as the proposed scalar MAC. The whole design is dynamically reconfigurable, fully-synthesizable, reusable, and verifiable. The proposed designs and previous relevant works are implemented and compared. Experimental results demonstrate that the proposed SWP MAC design theoretically and practically improves and outperforms previous works in terms of critical path delay, area cost, and power consumption.
CONTENTS

Abstract (Chinese) .................................................................................. I
Abstract (English) ................................................................................. II
Acknowledgment ................................................................................. III
Contents ................................................................................................ IV
List of Tables ...................................................................................... VII
List of Figures ................................................................................... VIII
Chapter 1 Introduction ....................................................................... 1
Chapter 2 Previous Works ................................................................. 4
2.0 Overview ……………………………………………………………………… 4
2.1 Prerequisites …………………………………………………………………... 4
2.1.1 Simple Multiplication & Booth's Algorithm ………………………...….... 4
2.1.2 Acceleration of Multiplication Flow ………………………………....…... 6
2.1.3 Modified Booth's Algorithm (MBA) ………………………………..….... 7
2.2 Related Works ……………………………………………………………...…. 9
2.2.1 Partial Product Generation (PPG) …………………………………...….... 9
2.2.2 Three-Dimensional-Method (TDM) PPRT …………………………...… 14
2.2.3 High-Speed Adders ……………………………………………………... 16
2.2.4 Sub-Word Parallelism (SWP) …………………………………………... 20
2.3 Summaries of Previous Works …………………………………………….… 26

Chapter 3 Proposed MAC Designs .................................................. 27
3.0 Overview ……………………………………………………….………...….. 27
3.1 Scalar MAC (SMAC) Design ……………………………………………..… 27
3.1.0 Specification ………………………………………………………….… 27
3.1.1 Scalar Partial Product Generation (SPPG) …………………………...…. 28
3.1.2 Scalar Partial Product Reduction Tree (SPPRT) …………………….….. 31
3.1.3 Scalar Carry-Propagate Adder (SCPA) ……………………………….… 33
3.1.4 Summaries of the Proposed Scalar MAC Design …………………….. 33
3.2 Sub-Word Parallel MAC (SWP MAC) Design………………….…………… 34
3.2.0 Specification ……………………………………………….…………… 34
3.2.1 Sub-Word Parallel MAC Execution Flow ……………………………… 35
3.2.2 Sub-Word Parallel PPG (SWPPG) ……………………………………… 36
3.2.3 Sub-Word Parallel PPRT (SWPPRT) …………………………………… 43
3.2.4 Sub-Word Parallel CPA (SWCPA) ……………………………………… 46
3.2.5 Summaries of the Proposed SWP MAC Design ...……………………… 49
Chapter 4 Experimental Results ...................................................... 50
4.0 Overview …………………………………………………………………...... 50
4.1 Implementation …………………………………………..………………….. 50
4.2 Discussion of Experimental Results ………………………………………… 51
4.2.0 Overview ………………………….…………………………………….. 51
4.2.1 Delay Comparison ………………………………...….………………… 52
4.2.2 Area Comparison …………………………………….…………………. 55
4.2.3 Power Comparison …………………………………...…………………. 58

Chapter 5 Application Notes ............................................................ 60
5.0 Overview ………………………………………………………………… 60
5.1 Functionality Enhancement ………………………………….……………… 60
5.1.1 Multiply-Accumulate (MAC) Operation ………………….……………. 60
5.1.2 Multiply-Negate (MAN) Operation ………………………..…………… 62
5.1.3 Unsigned Operation ………………………………………….….……… 65
5.1.4 Mixed-Mode Operation ……….……………………………...………… 67
5.2 Overflow/Underflow Check for FXP Numbers …………………...………… 69
5.2.1 Fixed-Point (FXP) Representation ……………………………………… 69
5.2.2 Maintaining Precision & Accuracy …………………………...………… 70
5.2.3 Saturation & Overflow/Underflow for Integers ………………………… 71
5.2.4 Rounding of Fractions …………………………………………….…..… 77
5.3 Reconfigurable Parameters Setup …………………………………....……… 78
Chapter 6 Conclusions ...................................................................... 82
Future Works ........................................................................................ 83
Bibliography ......................................................................................... 84
[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, pp. 698, pp. 484, pp. 488, John Wiley & Sons, 1999.
[2] P. Lapsley, J. Bier, A. Shoham and E. Lee, DSP Processor Fundamentals: Architectures and Features, p. 9, p. 35, p. 47, Berkeley Design Technology Inc., 1996
[3] B. Parhami, Computer Arithmetic Algorithms and Hardware Design, pp. 204-205, pp. 149-151, pp. 133-134, pp. 98-99, Oxford University Press, New York, 2000.
[4] O. L. MacSorley, "High-speed arithmetic in binary computers", Proc. IRE, vol. 49, pp. 67-91, 1961.
[5] C. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Trans. on Electronic Computers, vol.13, pp. 14-17, 1964.
[6] S. Krithivasan and M. J. Schulte, “Multiplier Architectures for Media Processing,” Proc. 37th Asilomar Conf. Signals, Systems, and Computers, pp. 2193-2197, Nov. 2003.
[7] M. Keating and P. Bricaud, Reuse Methodology Manual for System-on-Chip Designs, Kluwer Academic Publishers, third edition, 2002.
[8] V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Trans. Computers, vol. 45, no. 3, pp. 294--305, March 1996.
[9] W.-C. Yeh and C.-W. Jen, “High-Speed Booth Encoded Parallel Multiplier Design,” IEEE Trans. Computers, vol. 49, no. 7, pp. 692-701, July 2000.
[10] A. Danysh and D. Tan, "Architecture and Implementation of a Vector/SIMD Multiply-Accumulate Unit," IEEE Transactions on Computers, vol. 54, no. 3, pp. 284-293, Mar., 2005.
[11] D. Tan, A. Danysh, M. Liebelt, "Multiple-Precision Fixed-Point Vector Multiply-Accumulator Using Shared Segmentation," arith, p. 12, 16th IEEE Symposium on Computer Arithmetic (ARITH-16 '03), 2003.
[12] G. W. Bewick, "Fast Multiplication: Algorithms and Implementation," PhD dissertation, pp. 14-16, appendix A, pp. 13-14, Stanford University, Department of Electrical Engineering, Feb., 1994.
[13] A. D. Booth, "A Signed Binary Multiplication Technique," Quarterly J. Mechanical and Applied Math., vol. 4, pp. 236-240, 1951.
[14] L. Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, pages 349-356, March 1965.
[15] M. Santoro, “Design and Clocking of VLSI Multipliers”, PhD dissertation, Stanford University, Department of Electrical Engineering, 1989.
[16] R. Fried, "Minimizing Energy Dissipation in High-Speed Multipliers," Proc. 1997 Int'l Symp. Low Power Electronics and Design, pp. 214-219, 1997.
[17] M. Annaratone and W. Z. Shen, “The Design of an LSI Booth Multiplier,” Carnegie Mellon University Thesis report (CS), no. 150, 1984.
[18] A. A. Farooqui and V. G. Oklobdzija, “General Data-Path Organization of a MAC Unit for VLSI Implementation of DSP Processors,” Proc. 1998 IEEE Int'l Symp. Circuits and Systems, vol. 2, pp. 260-263, 1998.
[19] S. Vassiliadis, E.M. Schwarz, and B.M. Sung, “Hard-Wired Multipliers with Encoded Partial Products,” IEEE Trans. Computers, vol. 40, no. 11, pp. 1181-1197, Nov. 1991.
[20] P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, “Optimal circuits for parallel multipliers,” IEEE Transactions on Computers, vol. 47, no. 3, pp. 273-285, Mar. 1998.
[21] D. A. Patterson and J. L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, pp241-249, Morgan Kaufman Publishers, Inc., 2nd Edition, 1998.
[22] R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE Transactions on Computers, vol. 31, no. 3 pp.260-264, 1982.
[23] T. Han, D. A. Carlson, and Steven P. Levitan, “Fast Area Efficient VLSI Adders,” IEEE International Conference on Computer Design, pages 418-422, October 1987.
[24] H Ling, "High-Speed Binary Adder," IBM J. Res. Develop., vol. 25, no. 3, pp156-166, May 1981.
[25] G. Dimitrakopoulos and D. Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders,” IEEE Trans. Computers, vol. 54, No.2, Feb. 2005.
[26] Y. -C. Fong, "A High-Speed Area-Minimized Reconfigurable Adder Design," Master’s thesis, National Chiao Tung University, Department of Electronics Engineering, Jul. 2006.
[27] Analog Devices, Blackfin�� Processor Hardware Reference, revision 3.0, Sep., 2004. Available from www.analog.com.
[28] Texas Instruments, TMS320C6000 CPU and Instruction Set Reference Guide, revision F, Oct. 2000. Available from www.ti.com.
[29] C. G. Lee and M. G. Stoodley, “Simple Vector Microprocessors for Multimedia Applications,” Proc. 31st Ann. ACM/IEEE Int’l Symp. Microarchitecture, pp. 25-36, 1998.
[30] R. B. Lee, “Multimedia Extensions for General-Purpose Processors,” Proc. Signal Processing Systems (SIPS ’97), pp. 9-23, Nov. 1997.
[31] N. Burgess, “PAPA—Packed Arithmetic on a Prefix Adder For Multimedia Applications,” Proc. IEEE Int’l Conf. Application-Specific Systems, Architectures and Processors, pp. 197-207, July 2002.
[32] A. A. Farooqui, V. G. Oklobdzija, and F. Chehrazi, “Multiplexer Based Adder for Media Signal Processing,” Proc. 1999 Int’l Symp. VLSI Technnology, Systems, and Applications, pp 100-103, June 1999.
[33] C. R. Baugh and B. A. Wooley, "A two's complement parallel array multiplication algorithm," IEEE Transactions on Computers, vol. 22, pp. 1045--1047, December 1973.
[34] M. J. Schulte, L. P. Marquette, S. Krithivasan, E. G. Walters, and J. Glossner, “Combined Multiplication and Sum-of-Squares Units,” Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp. 204–214, June, 2003.
[35] Shankar Krithivasan, Michael J. Schulte, John Glossner, "A Subword-Parallel Multiplication and Sum-of-Squares Unit," isvlsi, p. 273, IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI'04), 2004.
[36] T. K. Callaway and E. E. Swamlander, Jr., “Power-Delay Characteristics of CMOS Multipliers,” Proceedings of rhe 13rh IEEE Siaworium 011 Cornpurer Arirhmeric, pp. 26-32, 1997.
[37] Artisan Components, UMC 0.18μm L180 Process 1.8-Volt Sage-XTMStandard Cell Library Databook, release 2.0, pp. 32-33, Nov. 2003.
[38] Synopsys Inc., DesignWare�� Building Block IP Documentation Overview, Jan. 17, 2005.
[39] Synopsys Inc., Design Compiler�� User Guide, version W-2004. 12, Dec., 2004.
[40] Synopsys Inc., PrimePower�� Manual, version W-2004. 12, Dec., 2004.
[41] Cadence Design Systems Inc., Verilog��-XL User Guide, version 3.4, Jan., 2002.
[42] Novas Software Inc., nLint�� User Guide and Tutorial, version 2.2, Dec., 2004.
[43] TransEDA Technology Ltd., Verification Navigator�� User Guide, version 2005.03, Mar., 2005.
[44] Cadence Design Systems Inc., Encounter�� Conformal�� Equivalence Checking User Guide, version 5.1, June, 2005.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊