(34.237.124.210) 您好!臺灣時間:2021/03/02 07:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:董盈里
研究生(外文):Tung, Ying-Li
論文名稱:高能源效率與低查表之基本運算單元設計與實現
論文名稱(外文):Design and Implementation of a Power-Efficient Elementary Function Unit with Small Look-up Tables
指導教授:范倫達
指導教授(外文):Van, Lan-Da
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:99
語文別:英文
論文頁數:44
中文關鍵詞:基本運算單元高能源效率浮點數
外文關鍵詞:elementary function unitpower-efficientfloating-point
相關次數:
  • 被引用被引用:0
  • 點閱點閱:192
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在本論文中,我們提出了三個高能源效率與低查表之浮點數基本運算單元。第一個基本運算單元提供16位元指數、對數、倒數與開根號倒數之運算,第二個基本運算單元提供32位元指數、對數、倒數和開根號倒數之運算,第三個基本運算單元雙精準度設計,支援上述所有的運算及解析度。我們運用了一個分段線性近似法來計算16位元之運算;在32位元運算方面,我們提出了一個新的二階多項式近似法來計算指數與對數,而倒數與開根號倒數則是採用向前看牛頓–拉福生法來計算。所有的運算精確度都達到1ulp。提出的二階多項式近似法可減少12%之查表大小。本論文之基本運算單元使用TSMC 0.18um製程設計與實現,模擬顯示可操作在142 MHz時脈下。16位元基本運算單元所需之平均功耗為3.52mW,而32位元基本運算單元平均功耗為37.55mW;雙解析度之設計在16位元模式下平均功耗為5.74mW,32位元模式下之平均功耗為38.49mW。
In this work, three power-efficient floating-point elementary function units with small look-up tables are proposed. The first elementary function unit design supports the half-precision IEEE-754 floating-point standard and implement exponential, logarithm, reciprocal, and inverse square root operations. The second design supports the single-precision IEEE-754 floating-point standard and also implements exponential, logarithm, reciprocal, and inverse square root operations. The third design is a dual-precision eight-mode elementary function unit which supports all the above mentioned functions and precisions. The presented elementary function units employ a piecewise linear approximation scheme for 16-bit operations, a relaxed look-ahead Newton-Raphson method for the computation of 32-bit floating-point reciprocal and inverse square root, and a two-level polynomial approximation for 32-bit exponential and logarithm. All the operations achieve 1ulp accuracy. The proposed power-efficient elementary function units in TSMC 0.18um CMOS process can be operated at 142MHz. The proposed two-level polynomial approximation can reduce table size by 12% with respect to previously proposed techniques, without any accuracy loss. The average power of the 16-bit elementary function unit is 3.52mW, while that of 32-bit design is 37.55mW. The dual-precision eight-mode elementary function unit has 5.74mW in 16-bit mode and 38.49mW in 32-bit mode.
摘 要 I
ABSTRACT II
誌 謝 IV
CONTENTS V
LIST OF TABLES VII
LIST OF FIGURES VIII
Chapter 1 Introduction 1
1.1 Motivation 3
1.2 Thesis Organization 3
Chapter 2 Review of Elementary Functions and Floating-Point Representations 5
2.1 Elementary Functions in Vertex Shader and Fragment Shader 5
2.2 Representation of Floating-Point Number 9
Chapter 3 Proposed Floating-Point Elementary Function Unit Designs 11
3.1 Piecewise Linear Approximation 13
3.2 Relaxed Look-ahead Newton Method 18
3.3 Two-level Polynomial Approximation 19
Chapter 4 Architecture of Elementary Function Units with Small Look-up Tables 23
4.1 Architecture for Piecewise Linear Approximation 23
4.2 Architecture for Piecewise Quadratic Approximation and Relaxed Look-ahead Newton Method 26
Chapter 5 Chip Implementation and Comparison Results 29
5.1 Chip Implementation 29
5.2 Simulation Results 32
5.3 Evaluation 34
5.4 Comparison 34
Chapter 6 Conclusion 38
Bibliography 39
Biography 44
[1] N. Ide, M. Hirano, Y. Endo, S. Yoshioka, H. Murakami, A. Kunimatsu, T. Sato, T. Kamei, T. Okada, and M. Suzuoki, “2.44-GFLOPS 300-MHz floating-point vector-processing unit for high-performance 3D graphics computing,” IEEE Journal of Solid-State Circuits, vol. 35, no. 7, pp. 1025-1033, July 2000.
[2] B. G. Nam, H. Kim, H. J. Yoo, “Power and area-efficient unified computation of vector and elementary functions for handheld 3D graphics systems”, IEEE Transactions on Computers, vol. 57, no. 4, pp. 490-504, April 2008.
[3] B. G. Nam and H. J. Yoo, "An embedded stream processor core based on logarithmic arithmetic for a low-power 3-D graphics SoC," IEEE Journal of Solid-State Circuits, vol. 44, no. 5, pp. 1554-1570, May 2009.
[4] D. D. Caro, N. Petra, A. G. M. Strollo, “A high performance floating-point special function unit using constrained piecewise quadratic approximation,” Proc. IEEE ISCAS, May 2008, pp. 472-475.
[5] D. Harris, “An exponentiation unit for an OpenGL lighting engine,” IEEE Trans. on Computers, vol. 53, no. 3, March 2004.
[6] D. Kim, K. Chung, C. H. Yu, C. H. Kim, I. Lee, J. Bae, Y. J. Kim, J. H. Park, S. H. Park, S. Kim, Y. H. Park, N. H. Seong, J. A. Lee, J. Park, S. Oh, S. W. Jeong, and L. S. Kim, “An SoC with 1.3 Gtexels/s 3-D graphics full pipeline for consumer applications,” IEEE Journal of Solid-State Circuits, vol. 41, no. 1, Jan. 2006.
[7] K. Diefendorff, P.K. Dubey, R. Hochprung, and H. Scales, “Altivec extension to PowerPC accelerates media processing,” IEEE Micro, pp. 85-95, Mar./Apr. 2000.
[8] S. Oberman, G. Favor, and F. Weber, “AMD-3DNow! Technology: architecture and implementations,” IEEE Micro, vol. 19, no. 2, pp. 37-48, Mar./Apr. 1999.
[9] M. J. Schulte and E. E. Swartzlander, "Hardware designs for exactly rounded elementary functions," IEEE Trans. on Computers, vol. 43, no. 8, Aug. 1994.
[10] J. E. Volder, “The CORDIC trigonometric computing technique,” IRE Trans. Electronic Computers, vol. 8, pp. 330-334, 1959.
[11] V Kantabutra, “On hardware for computing exponential and trigonometric functions,” IEEE Trans. on Computers, vol. 45, no. 3, March 1996.
[12] M.D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann, 2003.
[13] J. M. Muller, Elementary Functions: Algorithms and Implementations, Birkauser, 2nd edition, 2005.
[14] I. Koren and O. Zinaty, “Evaluating elementary functions in a numerical coprocessor based on rational approximations,” IEEE Trans. on Computers, vol. 39, no. 8, Aug. 1990.
[15] P.T.P. Tang, “Table-driven implementation of the logarithm function in IEEE floating-point arithmetic,” ACM Trans. Math. Software, vol. 4, no. 16, pp. 378-400, Dec. 1990.
[16] P.T.P. Tang, “Table look-up algorithms for elementary functions and their error analysis,” Proc. IEEE 10th Int’l Symp. Computer Arithmetic (ARITH10), pp. 232-236, 1991.
[17] M. D. Ercegovac, T. Lang, J. M. Muller, and A. Tisserand, “Reciprocation, square root, inverse square root, and some elementary functions using small multipliers,” IEEE Trans. on Computers, vol. 49, no. 7, July 2000.
[18] D. DasSarma and D.W. Matula, “Faithful bipartite ROM reciprocal tables,” IEEE Trans. on Computers, vol.47, no. 11, pp. 1216-1222, Nov. 1998.
[19] M. J. Schulte and J.E. Stine, “The symmetric table addition method for accurate function approximation,” J. VLSI Signal Processing, vol.21, no. 2, pp. 167-177, 1999.
[20] F. D. Dinechin and A. Tisserand, "Multipartite table methods," IEEE Trans. on Computers, vol. 54, no. 3, March 2005.
[21] N. Takagi, “Generating a power of an operand by a table look-up and a multiplication,” Computer Arithmetic, pp. 126-131, 1997.
[22] J. A. Pineiro, J. D. Bruguera, J. M. Muller, “Faithful powering computation using table look-up and a fused accumulation tree,” Computer Arithmetic, pp. 40-47, 2001.
[23] J. A. Pineiro and J. D. Bruguera, “High-speed double-precision compuatation of reciprocal, division, square root, and inverse square root,” IEEE Trans. on Computers, vol. 51, no. 12, Dec. 2002.
[24] A. Alimohammad, S. F. Fard, and B. F. Cockburn, “A unified architecture for the accurate and high-throughput implementation of six key elementary functions,” IEEE Trans. on Computers, vol. 59, no. 4, April 2010.
[25] H. Kwan, R. L. Nelson Jr., and E. E. Swartzlander Jr., "Cascaded implementation of an iterative inverse-square-root algorithm with overflow lookahead," Proc. 12th Symp. Computer Arithmetic, pp. 115-123, 1995.
[26] K. E. Wires and M. J. Schulte, "Reciprocal and reciprocal square root units with operand modification and multiplication," Journal of VLSI Signal Processing, vol. 42, pp. 257-272, 2006.
[27] M. Zhang, J.G. Delgado-Frias, S. Vassiliadis, “Table driven Newton scheme for high precision logarithm generation,” IEE Proc.-Comput. Digit. Tech., vol. 141, no. 5, Sep. 1994.
[28] A. Happonen, P. Salmela, A. Burian, “Processing element for reciprocal and reciprocal square root,” NORCHIP, pp 133-136, Nov. 2008.
[29] B. G. Nam, H. Kim, and H. J. Yoo, “A low-power unified arithmetic unit for programmable handheld 3-D graphics Systems,” IEEE Journal of Solid-State Circuits, Vol. 42, No. 8, pp.1767-1778, 2007.
[30] H. Kim, B. G. Nam, J. H. Sohn, and H. J. Yoo, "A 231Mhz, 2.1mW 32-bit logarithm arithmetic unit for fixed-point 3D graphics system," Asian Solid-State Circuits Conference, pp. 305-308, 2005.
[31] S. H. Kim, H. Y. Kim, H. Y. Kim, Y. J. Kim, K. Chung, D. Kim, and L. S. Kim, “A 116 fps/74mW heterogeneous 3D-Media display applications,” IEEE Journal of Solid-State Circuits, vol. 45, no. 3, March 2010.
[32] A. Akkas and M. J. Schulte, "Dual-mode floating-point multiplier architectures with parallel operations," Journal of Systems Architecture, vol. 52, pp. 549-562, 2006.
[33] B. S. Liang, Y. C. Lee, W. C. Yeh, and C. W. Jen, “Index rendering: hardware-efficient architecture for 3-D graphics in multimedia system,” IEEE Transactions on Multimedia, vol.4 , no.2 , pp. 343-360, June 2002.
[34] J. H. Woo, et al., “A 195 mW/152 mW mobile multimedia SoC with fully programmable 3-D graphics and MPEG4/H.264/JPEG,” IEEE Journal of Solid-State Circuits, vol. 43, no. 9, pp. 2047-2056, Sep. 2008.
[35] R. J. Simpson “OpenGL? ES Shading Language Specification.”
[36] D. D. Caro, N. Petra, and A. G. M. Strollo, “High-performance special function unit for programmable 3-D graphics processors,” IEEE Trans. on Circuits Syst. I, Reg. Papers, vol. 56, no. 9, pp. 1968-1978, Sep. 2009.
[37] D. Bariamis, D. Maroulis, D. K. Iakovidis, “Adaptable, Fast, Area-Efficient Architecture for Logarithm Approximation with Arbitrary Accuracy on FPGA,” Journal of Signal Processing Systems, vol. 58, pp. 301-310, 2010.
[38] T. W. Chang, “A Power-Efficient Reconfigurable Elementary Function Unit Design and Implementation," M.S. thesis, Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, 2009.
[39] J.A. Pineiro, S. F. Oberman, J. M. Muller, J. D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Trans. on Computers, vol. 54, no. 3, pp. 304-318, March 2005.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔