(34.237.52.11) 您好!臺灣時間:2021/05/18 14:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:翁巍庭
研究生(外文):Weng, Wei-Ting
論文名稱:改良型浮點數矩陣乘法器
論文名稱(外文):Improved Floating-Point Matrix Multiplier
指導教授:杜迪榕
指導教授(外文):Duh, Dyi-Rong
口試委員:傅榮勝阮夙姿杜迪榕陳依蓉林聰吉
口試委員(外文):Fu, Jung-ShengJuan, Justie Su-TzuDuh, Dyi-RongChen, Yi-JungLin, Tsung-Chi
口試日期:2011-07-08
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:39
中文關鍵詞:矩陣乘法浮點乘法Booth 編碼法部分乘積產生器
外文關鍵詞:matrix multiplicationfloating-point multiplicationBooth encodingpartial products generator
相關次數:
  • 被引用被引用:0
  • 點閱點閱:252
  • 評分評分:
  • 下載下載:26
  • 收藏至我的研究室書目清單書目收藏:0
浮點數矩陣乘法器被廣泛地運用在科學計算上,因為大量學者的研究使得浮點數矩
陣乘法器擁有越來越高的效能。由於矩陣乘法包含了大量的乘法以及加法的運算,
近來Bensaali 等在FPGA 上設計了一個模組化設計的浮點數矩陣乘法器,它使用多
個相同的原件來實現這些乘法與加法,並且將每次得到的結果用一個向量存放在暫
存器中。元件可重複利用的特性使得硬體成本下降,但是延遲也因此變長。為此楊
蘭超與杜迪榕教授提出了新的設計,此設計保留了模組化的優點並在運算過程中改
藉由兩個向量來儲存每次運算後的結果。與先前的研究相比,此設計大幅度縮短了
延遲卻也造成了成本的增加。本研究採用了Booth 編碼法來改進這個模組化設計中
乘法的部分,因而減少了部分乘積的數量,使得成本大幅地降低而且延遲也縮得更
短,於是浮點數矩陣乘法器有了更佳的效能。
Floating-point matrix multiplier is widely used in scientific computations. A great deal of
efforts has been made to achieve higher performance. The matrix multiplication consists
of many multiplications and accumulations. In 2007, Bensaali et al. proposed a design of
floating-point matrix multiplication on FPGAs. It uses duplicate components to
implement the multiplications and accumulations, and it reserves the intermediate result
as single vector in a register. The reusability of these components makes the hardware
cost be smaller, but the delay be longer. Yang and Duh proposed in 2009 a new modular
design of floating-point matrix multiplier which reserving the intermediate result as two
vectors instead of one. It brings shorter delay but more cost than Bensaali et al.’s work.
This work modifies Yang and Duh’s design with Booth encoding in multiplication to
reduce the number of partial products. As the result, the improved floating-point matrix
multiplier has better performance with shorter delay and much less hardware cost than
Yang and Duh’s design.
致謝 ....................................................................................................................................... i
論文摘要...............................................................................................................................ii
Abstract ...............................................................................................................................iii
Content ............................................................................................................................... iv
List of Figures ..................................................................................................................... vi
List of Tables .....................................................................................................................viii
1 Introduction ................................................................................................................. 1
1.1 Background........................................................................................................ 1
1.1.1 Hardware Adders ................................................................................... 2
1.1.2 IEEE Standard 754 ................................................................................ 4
1.2 Related Work ..................................................................................................... 4
1.3 Overview ........................................................................................................... 6
1.4 Organization of This Thesis............................................................................... 7
2 Matrix Multiplication and Floating-Point Operations ............................................ 8
2.1 Matrix Multiplication ........................................................................................ 8
2.2 Floating-Point Multiplication ............................................................................ 9
2.3 Floating-Point Addition ................................................................................... 14
3 Improved Floating-Point Matrix Multiplier ........................................................... 17
3.1 Proposed Algorithm......................................................................................... 17
3.2 Proposed Architecture...................................................................................... 20
3.2.1 Refined Partial Products Generator ..................................................... 21
3.2.2 Revised Partial Products Reduction .................................................... 25
3.3 Delay and Cost ................................................................................................ 28
v
3.4 Comparison...................................................................................................... 32
4 Conclusion and Future Work ................................................................................... 35
4.1 Concluding Remarks ....................................................................................... 35
4.2 Future Work ..................................................................................................... 36
Bibliography...................................................................................................................... 37
[1] A. Beaumont-Smith, N. Burgess, S. Lefrere, and C.C. Lim, “Reduced latency IEEE
floating-point standard adder architectures,” in: Proc. 14th IEEE Symposium on
Computer Arithmetic, pp. 35-42, 1999.
[2] F. Bensaali, A. Amira, and A. Bouridane, “Accelerating matrix product on
reconfigurable hardware for image processing applications,” IEE Proc. Circuits,
Devices and Systems, vol. 152, no. 3, pp. 236–246, Jun. 2005.
[3] F. Bensaali, A. Amira, and R. Sotudeh, “Floating-point matrix product on FPGA,” in:
Proc. ACS/IEEE International Conference on Computer Systems and Applications,
pp. 466–473, 2007.
[4] G. Choe and E.E. Swartzlander Jr., “Merged Arithmetic for computing wavelet
transforms,” in: Proc. 8th Great Lakes Symposium on VLSI, pp. 196–201, 1998.
[5] L. Dadda, “Some schemes for parallel multipliers,” Alta Frequenza, vol. 34, pp.
349–356, 1965.
[6] Y. Dou, S. Vassiliadis, G. K. Kuzmanov, and G. N. Gaydadjiev, “64-bit floating-point
FPGA matrix multiplication,” in: Proc. 2005 ACM/SIGDA 13th International
Symposium on FPGA, pp. 86–95, Feb. 2005.
[7] H.A.H. Fahmy, A.A. Liddicoat and M.J. Flynn, “Improving the effectiveness of
floating point arithmetic,” in: Proc. 35th Asiloma Conference on Signals, Systems and
Computers, vol. 1, pp. 875–879, Nov. 2001.
[8] K.A. Feiste and E.E. Swartzlander Jr., “Merged arithmetic revisited,” in: Proc. IEEE
Workshop on Signal Processing Systems, pp. 212–221, Nov. 1997.
[9] H.P. Huang and D.R. Duh, “Fast computation algorithm for robot dynamics and its
implementation,” in: Proc. IEEE International Symposium on Industrial Electronics,
pp. 352–356, May. 1992.
[10] IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985.
[11] J.W. Jang, S.B. Choi, and V.K. Prasanna, “Energy- and time-efficient matrix
multiplication on FPGAs,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 13, no. 11, pp. 1305–1319, Nov. 2005.
[12] V. B. Y. Kumar, S. Joshi, S. B. Patkar, and H. Narayanan, “FPGA based high
performance double-precision matrix multiplication,” International Journal of
Parallel Programming, vol. 38, no. 3–4, pp. 322–338, Feb. 2010.
[13] G. Kuzmanov and W. M. van Oijen, “Floating-point matrix multiplication in a
polymorphic processor,” in: Proc. International Conference on Field-Programmable
Technology, pp. 249–252, Dec. 2007.
[14] P.-M. Seidel and G. Even, “Delay-optimized implementation of IEEE floating-point
addition,” IEEE Transactions on Computers, vol. 53, no. 2, pp. 97–113, Feb. 2004.
[15] W. C. Park, T. D. Han, and S. D. Kim, “Efficient simultaneous rounding method
removing sticky-bit from critical path for floating point addition,” in: Proc. 2nd IEEE
Asia Pacific Conference on ASICs, pp. 223–226, Aug. 2000.
[16] W. C. Park, S. W. Lee, O. Y. Kwon, T. D. Han, and S. D. Kim, “Floating-point
adder/subtractor performing IEEE rounding and addition/subtraction in parallel,”
IEICE Transactions on Information and Systems, vol. E79–D, no. 4, pp. 297–305, Apr.
1996.
[17] N.T. Quach, N. Takagi, and M.J. Flynn, “Systematic IEEE rounding method for
high-speed floating-point multipliers,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 12, no. 5, pp. 511–521, May. 2004.
[18] V. Strassen, “Gaussian elimination is not optimal,” Numerische Mathematik, vol.13,
no. 4, pp. 354–356, Aug. 1969.
[19] E.E. Swartzlander Jr., “Merged arithmetic,” IEEE Transactions on Computers, vol.
C-29, no. 10, pp. 946–950, Oct. 1980.
[20] C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Transactions on Electronic
Computers, vol. EC-13, no. 1, pp. 14–17, Feb. 1964.
[21] L. C. Yang and D. R. Duh, “Optimized design of a floating-point matrix multiplier,”
in: Proc. National Computer Symposium, pp. 300–308, Nov. 2009.
[22] W. C. Yeh and C. W. Jen, “High-speed Booth encoded parallel multiplier design,”
IEEE Transactions on Computers, vol. 49, no.7, pp. 692–701, Jul. 2000.
[23] L. Zhuo and V.K. Prasanna, “Scalable and modular algorithms for floating-point
matrix multiplication on FPGAs,” in: Proc. 18th International Parallel and
Distributed Processing Symposium, pp. 92–101, Apr. 2004.
[24] L. Zhuo and V.K. Prasanna, “Scalable and modular algorithms for floating-point
matrix multiplication on reconfigurable computing systems,” IEEE Transactions on
Parallel and Distributed Systems, vol. 18, no. 4, pp. 433–448, Apr. 2007.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top