臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.81) 您好！臺灣時間：2025/10/04 04:59

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

林宏光

論文名稱:

高效能且可組態之子字組平行化乘加器設計

論文名稱(外文):

High-Performance Reconfigurable Sub-Word Parallel Multiplier-Accumulator Design

指導教授:

黃俊達

指導教授(外文):

Juinn-Dar Huang

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電子工程系所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2006

畢業學年度:

語文別:

英文

論文頁數:

中文關鍵詞:

乘法器、乘加器、可組態、平行化、資料路徑、多媒體、算術單元、高效態

外文關鍵詞:

multiplier、multiply-accumulate、MAC、SIMD、parallel、Booth、Wallace、high performance

相關次數:

被引用:0
點閱:192
評分:
下載:24
書目收藏:0

本論文提出一個高效能乘加器的設計方法。此乘加器除支援子字組平行化功能之外，還能執行混模運算並具較有彈性的子字組設定。我們提出了一個新的子字平行部份乘積陣列及一個創新的子字平行部份乘積簡化樹以實現子字組平行化。為了利用原本的乘加器硬體，子字組平行化乘加器僅需增加微量的延遲及些許的面積。我們提出的乘加器可動態重組、可合成、可重覆使用且可驗證。我們實做並比較我們的設計及先前的設計。實驗數據顯示，無論在設計延遲、所佔面積、所耗功率，我們的方法在理論上及實務上都改善並且勝過舊方法。

This thesis presents the design methodology of a high-performance reconfigurable multiplier-accumulator (MAC) capable of supporting sub-word parallelism (SWP) and additional features such as mixed-mode operation and flexible sub-word combination and mode assignment scheme. In order to perform SWP on the proposed scalar MAC, a new SWP partial product array and a novel speed-optimized SWP partial product reduction tree are proposed. With slight delay and some area overhead, the SWP MAC utilizes essentially the same hardware as the proposed scalar MAC. The whole design is dynamically reconfigurable, fully-synthesizable, reusable, and verifiable. The proposed designs and previous relevant works are implemented and compared. Experimental results demonstrate that the proposed SWP MAC design theoretically and practically improves and outperforms previous works in terms of critical path delay, area cost, and power consumption.

CONTENTS

Abstract (Chinese) .................................................................................. I
Abstract (English) ................................................................................. II
Acknowledgment ................................................................................. III
Contents ................................................................................................ IV
List of Tables ...................................................................................... VII
List of Figures ................................................................................... VIII
Chapter 1 Introduction ....................................................................... 1
Chapter 2 Previous Works ................................................................. 4
2.0 Overview ……………………………………………………………………… 4
2.1 Prerequisites …………………………………………………………………... 4
2.1.1 Simple Multiplication & Booth's Algorithm ………………………...….... 4
2.1.2 Acceleration of Multiplication Flow ………………………………....…... 6
2.1.3 Modified Booth's Algorithm (MBA) ………………………………..….... 7
2.2 Related Works ……………………………………………………………...…. 9
2.2.1 Partial Product Generation (PPG) …………………………………...….... 9
2.2.2 Three-Dimensional-Method (TDM) PPRT …………………………...… 14
2.2.3 High-Speed Adders ……………………………………………………... 16
2.2.4 Sub-Word Parallelism (SWP) …………………………………………... 20
2.3 Summaries of Previous Works …………………………………………….… 26

Chapter 3 Proposed MAC Designs .................................................. 27
3.0 Overview ……………………………………………………….………...….. 27
3.1 Scalar MAC (SMAC) Design ……………………………………………..… 27
3.1.0 Specification ………………………………………………………….… 27
3.1.1 Scalar Partial Product Generation (SPPG) …………………………...…. 28
3.1.2 Scalar Partial Product Reduction Tree (SPPRT) …………………….….. 31
3.1.3 Scalar Carry-Propagate Adder (SCPA) ……………………………….… 33
3.1.4 Summaries of the Proposed Scalar MAC Design …………………….. 33
3.2 Sub-Word Parallel MAC (SWP MAC) Design………………….…………… 34
3.2.0 Specification ……………………………………………….…………… 34
3.2.1 Sub-Word Parallel MAC Execution Flow ……………………………… 35
3.2.2 Sub-Word Parallel PPG (SWPPG) ……………………………………… 36
3.2.3 Sub-Word Parallel PPRT (SWPPRT) …………………………………… 43
3.2.4 Sub-Word Parallel CPA (SWCPA) ……………………………………… 46
3.2.5 Summaries of the Proposed SWP MAC Design ...……………………… 49
Chapter 4 Experimental Results ...................................................... 50
4.0 Overview …………………………………………………………………...... 50
4.1 Implementation …………………………………………..………………….. 50
4.2 Discussion of Experimental Results ………………………………………… 51
4.2.0 Overview ………………………….…………………………………….. 51
4.2.1 Delay Comparison ………………………………...….………………… 52
4.2.2 Area Comparison …………………………………….…………………. 55
4.2.3 Power Comparison …………………………………...…………………. 58

Chapter 5 Application Notes ............................................................ 60
5.0 Overview ………………………………………………………………… 60
5.1 Functionality Enhancement ………………………………….……………… 60
5.1.1 Multiply-Accumulate (MAC) Operation ………………….……………. 60
5.1.2 Multiply-Negate (MAN) Operation ………………………..…………… 62
5.1.3 Unsigned Operation ………………………………………….….……… 65
5.1.4 Mixed-Mode Operation ……….……………………………...………… 67
5.2 Overflow/Underflow Check for FXP Numbers …………………...………… 69
5.2.1 Fixed-Point (FXP) Representation ……………………………………… 69
5.2.2 Maintaining Precision & Accuracy …………………………...………… 70
5.2.3 Saturation & Overflow/Underflow for Integers ………………………… 71
5.2.4 Rounding of Fractions …………………………………………….…..… 77
5.3 Reconfigurable Parameters Setup …………………………………....……… 78
Chapter 6 Conclusions ...................................................................... 82
Future Works ........................................................................................ 83
Bibliography ......................................................................................... 84

[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, pp. 698, pp. 484, pp. 488, John Wiley & Sons, 1999.
[2] P. Lapsley, J. Bier, A. Shoham and E. Lee, DSP Processor Fundamentals: Architectures and Features, p. 9, p. 35, p. 47, Berkeley Design Technology Inc., 1996
[3] B. Parhami, Computer Arithmetic Algorithms and Hardware Design, pp. 204-205, pp. 149-151, pp. 133-134, pp. 98-99, Oxford University Press, New York, 2000.
[4] O. L. MacSorley, "High-speed arithmetic in binary computers", Proc. IRE, vol. 49, pp. 67-91, 1961.
[5] C. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Trans. on Electronic Computers, vol.13, pp. 14-17, 1964.
[6] S. Krithivasan and M. J. Schulte, “Multiplier Architectures for Media Processing,” Proc. 37th Asilomar Conf. Signals, Systems, and Computers, pp. 2193-2197, Nov. 2003.
[7] M. Keating and P. Bricaud, Reuse Methodology Manual for System-on-Chip Designs, Kluwer Academic Publishers, third edition, 2002.
[8] V. G. Oklobdzija, D. Villeger, and S. S. Liu, "A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach," IEEE Trans. Computers, vol. 45, no. 3, pp. 294--305, March 1996.
[9] W.-C. Yeh and C.-W. Jen, “High-Speed Booth Encoded Parallel Multiplier Design,” IEEE Trans. Computers, vol. 49, no. 7, pp. 692-701, July 2000.
[10] A. Danysh and D. Tan, "Architecture and Implementation of a Vector/SIMD Multiply-Accumulate Unit," IEEE Transactions on Computers, vol. 54, no. 3, pp. 284-293, Mar., 2005.
[11] D. Tan, A. Danysh, M. Liebelt, "Multiple-Precision Fixed-Point Vector Multiply-Accumulator Using Shared Segmentation," arith, p. 12, 16th IEEE Symposium on Computer Arithmetic (ARITH-16 '03), 2003.
[12] G. W. Bewick, "Fast Multiplication: Algorithms and Implementation," PhD dissertation, pp. 14-16, appendix A, pp. 13-14, Stanford University, Department of Electrical Engineering, Feb., 1994.
[13] A. D. Booth, "A Signed Binary Multiplication Technique," Quarterly J. Mechanical and Applied Math., vol. 4, pp. 236-240, 1951.
[14] L. Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, pages 349-356, March 1965.
[15] M. Santoro, “Design and Clocking of VLSI Multipliers”, PhD dissertation, Stanford University, Department of Electrical Engineering, 1989.
[16] R. Fried, "Minimizing Energy Dissipation in High-Speed Multipliers," Proc. 1997 Int'l Symp. Low Power Electronics and Design, pp. 214-219, 1997.
[17] M. Annaratone and W. Z. Shen, “The Design of an LSI Booth Multiplier,” Carnegie Mellon University Thesis report (CS), no. 150, 1984.
[18] A. A. Farooqui and V. G. Oklobdzija, “General Data-Path Organization of a MAC Unit for VLSI Implementation of DSP Processors,” Proc. 1998 IEEE Int'l Symp. Circuits and Systems, vol. 2, pp. 260-263, 1998.
[19] S. Vassiliadis, E.M. Schwarz, and B.M. Sung, “Hard-Wired Multipliers with Encoded Partial Products,” IEEE Trans. Computers, vol. 40, no. 11, pp. 1181-1197, Nov. 1991.
[20] P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, “Optimal circuits for parallel multipliers,” IEEE Transactions on Computers, vol. 47, no. 3, pp. 273-285, Mar. 1998.
[21] D. A. Patterson and J. L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, pp241-249, Morgan Kaufman Publishers, Inc., 2nd Edition, 1998.
[22] R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE Transactions on Computers, vol. 31, no. 3 pp.260-264, 1982.
[23] T. Han, D. A. Carlson, and Steven P. Levitan, “Fast Area Efficient VLSI Adders,” IEEE International Conference on Computer Design, pages 418-422, October 1987.
[24] H Ling, "High-Speed Binary Adder," IBM J. Res. Develop., vol. 25, no. 3, pp156-166, May 1981.
[25] G. Dimitrakopoulos and D. Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders,” IEEE Trans. Computers, vol. 54, No.2, Feb. 2005.
[26] Y. -C. Fong, "A High-Speed Area-Minimized Reconfigurable Adder Design," Master’s thesis, National Chiao Tung University, Department of Electronics Engineering, Jul. 2006.
[27] Analog Devices, Blackfin�� Processor Hardware Reference, revision 3.0, Sep., 2004. Available from www.analog.com.
[28] Texas Instruments, TMS320C6000 CPU and Instruction Set Reference Guide, revision F, Oct. 2000. Available from www.ti.com.
[29] C. G. Lee and M. G. Stoodley, “Simple Vector Microprocessors for Multimedia Applications,” Proc. 31st Ann. ACM/IEEE Int’l Symp. Microarchitecture, pp. 25-36, 1998.
[30] R. B. Lee, “Multimedia Extensions for General-Purpose Processors,” Proc. Signal Processing Systems (SIPS ’97), pp. 9-23, Nov. 1997.
[31] N. Burgess, “PAPA—Packed Arithmetic on a Prefix Adder For Multimedia Applications,” Proc. IEEE Int’l Conf. Application-Specific Systems, Architectures and Processors, pp. 197-207, July 2002.
[32] A. A. Farooqui, V. G. Oklobdzija, and F. Chehrazi, “Multiplexer Based Adder for Media Signal Processing,” Proc. 1999 Int’l Symp. VLSI Technnology, Systems, and Applications, pp 100-103, June 1999.
[33] C. R. Baugh and B. A. Wooley, "A two's complement parallel array multiplication algorithm," IEEE Transactions on Computers, vol. 22, pp. 1045--1047, December 1973.
[34] M. J. Schulte, L. P. Marquette, S. Krithivasan, E. G. Walters, and J. Glossner, “Combined Multiplication and Sum-of-Squares Units,” Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, pp. 204–214, June, 2003.
[35] Shankar Krithivasan, Michael J. Schulte, John Glossner, "A Subword-Parallel Multiplication and Sum-of-Squares Unit," isvlsi, p. 273, IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI'04), 2004.
[36] T. K. Callaway and E. E. Swamlander, Jr., “Power-Delay Characteristics of CMOS Multipliers,” Proceedings of rhe 13rh IEEE Siaworium 011 Cornpurer Arirhmeric, pp. 26-32, 1997.
[37] Artisan Components, UMC 0.18μm L180 Process 1.8-Volt Sage-XTMStandard Cell Library Databook, release 2.0, pp. 32-33, Nov. 2003.
[38] Synopsys Inc., DesignWare�� Building Block IP Documentation Overview, Jan. 17, 2005.
[39] Synopsys Inc., Design Compiler�� User Guide, version W-2004. 12, Dec., 2004.
[40] Synopsys Inc., PrimePower�� Manual, version W-2004. 12, Dec., 2004.
[41] Cadence Design Systems Inc., Verilog��-XL User Guide, version 3.4, Jan., 2002.
[42] Novas Software Inc., nLint�� User Guide and Tutorial, version 2.2, Dec., 2004.
[43] TransEDA Technology Ltd., Verification Navigator�� User Guide, version 2005.03, Mar., 2005.
[44] Cadence Design Systems Inc., Encounter�� Conformal�� Equivalence Checking User Guide, version 5.1, June, 2005.

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	平行十進制平方器與乘法器設計
2.	具猜測能力之高效能算術單元的設計和實現
3.	APipelinedH.264DecoderontheCellBroadbandEngine
4.	用於多核心平台之高效能H.264解碼器設計
5.	適合高效能及低功率處理器設計的合成技術
6.	使用布斯-華勒斯乘法器實現管線式快速傅立葉轉換電路研究之設計
7.	高速度低功率多重資料流乘法器
8.	藉由減少資料間轉換設計低功率消耗數位訊號處理器
9.	乘加模組產生器之研製
10.	適用於低電壓超大型積體電路系統之新的加法器與乘法器架構

無相關期刊

1.	高速及面積最小化之可組態加法器設計
2.	低功率指令快取記憶體之架構設計
3.	應用於車輛側邊安全輔助之影像偵測系統
4.	空氣污染排放交易抵換係數之適用性分析
5.	以鋁鹽混凝劑處理二氧化矽顆粒廢水─鋁型態分佈及轉化特性的影響
6.	嵌入浮水印於可調式視訊
7.	MPEG-4HE-AAC中之高頻調整模組設計
8.	以細微結構與光學系統調變誘發側向結晶
9.	以隨機位元認證機制抵禦802.11無線網路阻絶式攻擊
10.	物理解題動畫作為高中生課後輔助教材之成效探討
11.	幹細胞運用之法律評析
12.	VCM應用LQG方法於光碟機的制振研究
13.	季風及區外污染源對工業區空氣品質監測站網優選之影響分析
14.	台灣石材產業經營困境與未來發展方向之研究
15.	唯行動或唯固網相對於全業務電信經營效率的研究(2000-2004年)

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室