(3.238.7.202) 您好!臺灣時間:2021/03/02 00:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:翁嘉陽
研究生(外文):Chia-Yang Weng
論文名稱:採用捨棄式乘法器與平方器的硬體函數計算單元之設計最佳化
論文名稱(外文):Design Optimization of Hardware Function Evaluation Units with Truncated Multipliers and Squarers
指導教授:蕭勝夫
指導教授(外文):Shen-Fu Hsiao
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:112
中文關鍵詞:數位算術運算等份切割法捨棄式乘法器和捨棄式平方器函數求值硬體多項式近似法
外文關鍵詞:function evaluationpolynomial approximationuniform segmentationdigital arithmetictruncated multipliertruncated squarer
相關次數:
  • 被引用被引用:1
  • 點閱點閱:109
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在算術運算中函數近似法扮演著重要的角色,如繪圖處理器的特殊函數單元、立體視覺和 3D 影像處理相關研究。以硬體實現函數求值的方法中,最常見的架構通常包含表格和算術電路,其中又以等份多項式近似架構最為成熟。此論文提出一個新的方法以捨棄式乘法器和捨棄式平方器來優化等份多項式近似架構。過去的方法通常透過事先配置允許誤差在各個硬體元件(如 LUT和算術單元)來決定係數寬度以及算數單元大小。本論文提出綜合所有誤差來源,如近似誤差、量化誤差、算術單元捨棄所造成誤差和最後四捨五入誤差一併考慮,因此能更有效運用總誤差來優化各元件大小,提升整體電路面積以及縮短延遲。本論文也發現表格最佳化的情況下,無法保證其整體電路面積是最小的,原因在於算術電路面積佔整體面積中一半以上,所以本論文利用捨棄表格面積(以誤差角度來看,減少近似誤差和量化誤差,讓更多允許誤差發生在算術單元)來賺取更小的算術電路面積,在這兩者之間做取捨找出之間的平衡點來確保整體面積最佳化,數據也顯示適當的放寬表格大小在面積和速度上有更好的表現。
Function evaluation is an important operation in the design of special function unit in graphics processing unit (GPU) and other applications in stereo vision and 3D image processing. Among various hardware function evaluation design methods, piecewise polynomial approximation (PPA), composed of Look-Up Table (LUT) and simple arithmetic components of multipliers and adders, is the most popular approach. In the thesis, we present a new design with truncated multipliers and squarer for area optimization of PPA with uniform segmentation. Unlike the previous designs that determine the design parameters of bit widths of the hardware components using error budget assignment, we propose a combined error optimization method that jointly considers different error sources, including approximation error, quantization error, truncation error, and rounding error so that the area cost and delay can be further reduced. It is observed that optimization of LUT size does not necessarily lead to smallest total area because LUT only takes a small portion of total area in small and medium precisions where the area of arithmetic component takes more than 50%. The experimental results show the trade-off between the area of LUT area and arithmetic components, allowing us to find the optimized design with the smallest total area cost.
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 viii
第1章、 導論 1
1.1 研究動機 1
1.2 論文架構 2
第2章、 研究背景與相關研究 3
2.1 函數近似方法分類 3
2.2 查表法(Table-lookup Methods) 4
2.3 間接查表法分類 6
2.3.1 Computed-Bound Methods 6
2.3.2 Table-Bound Methods 6
2.3.3 In-between Methods 10
2.4 多項式函數近似Polynomial Function Approximation 12
2.4.1 各類函數研究(Investigated functions) 12
2.4.2 函數分區(Partitioning) 17
2.4.3 決定分區間距(Determining the partitioning interval) 19
2.5.3 計算係數(Calculating the coefficients) 20
2.5.4 Faithfully rounding and exactly rounding 22
2.5 誤差分析(Error Analysis In Piecewise Polynomial) 24
2.5.1 誤差分配(Error Budget) 25
2.6 係數優化方法比較 29
第3章、 Truncated multiplier and squarer在function evaluation 之應用 32
3.1 截斷式乘法器修正誤差方法 34
3.1.1 Constant correction truncated multiplier 35
3.1.2 Variable correction truncated multiplier 36
3.2 截斷式平方器修正誤差方法 37
3.1 Constant correction truncated squarer 39
3.2 Variable correction truncated squarer 40
3.3 Tree reduction of parallel multiplier and squarer 41
3.4 Truncated multiplier and squarer運用在function evaluation之架構 43
第4章、 Combine error method 46
4.1 方法敘述 46
4.2 實作方法 47
4.2.1 Optimizing using Truncated-Matrix Units 52
4.2.2 係數調整及設計優化 56
4.2.3 整合誤差(Combined Error)與窮舉搜尋(Exhaustive Search) 62
4.3 優化目標 63
4.3.1 Total table size optimization 64
4.3.2 Total area optimization 65
4.4 演算法流程 71
第5章、 實驗結果與數據比較 74
5.1 Table size optimization 和Total area optimization比較數據 75
5.2 各函數數據 82
5.3 論文數據比較 90
第6章、 結論與未來展望 97
6.1 結論 97
6.2 未來展望 98
參考文獻 99
[1]M.J. Schulte, E. E. Swartzlander, Jr. “Hardware Designs for Exactly Rounded Elementary Functions,” IEEE Transactions on Computers, 43(8):964–973, August 1994.
[2] K. E. Wires, M. J. Schulte, L. P. Marquette, and P. I. Balzola. “Combined Unsigned and Two’s Complement Squarers,” In Proceedings of the 33rd Asilomar Conference on Signals, Systems, and Computers, volume 2, pages 1215–1219, Pacific
Grove, CA, October 1999.
[3] A. A. Liddicoat, M. J. Flynn, “Parallel Square and Cube Computation” In IEEE 34th Asilomar Conference on Signals, Systems and Computers, 2000
[4] Walters, E.G., III; Schulte, M.J. “Efficient Function Approximation Using Truncated Multipliers and Squarers,” In Proceedings of the 17th IEEE Symposium on Computer Arithmetic, Cape Cod, MA, USA, 27–29 June 2005; pp. 232–239.
[5] D. Lee, R. Cheung, W. Luk, and J. Villasenor, “Hardware implementation trade-offs of polynomial approximations and interpolations,” IEEE Trans. Comput., vol. 57, no. 5, pp. 686–701, May 2008.
[6] E. G. Walters, III, “Linear and quadratic interpolators using truncated-matrix multipliers and squarers,” Computers, vol. 4, no. 4, pp. 293–321, Dec. 2015.
[7] M. Sadeghian, J. E. Stine, and E. G. Walters, III, “Optimized linear, quadratic and cubic interpolators for elementary function hardware implementation,” Electronics, vol. 5, no. 12, p. 17, Jun. 2016.
[8] Davide De Caro, E. Napoli, D. Esposito, “Minimizing Coefficients Wordlength for Piecewise-Polynomial Hardware Function Evaluation With Exact or Faithful Rounding,” IEEE Trans. on Circuit and Systems, vol. pp, no. 99, pp. 1-14, January 2017.
[9] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach,” IEEE Trans. Comput., vol. 45, no. 3, pp. 294–306, Mar. 1996.
[10] M. J. Schulte and E. E. Swartzlander Jr., “Truncated multiplication with correction constant,” VLSI Signal Processing VI, pp. 388–396, 1993
[11] Walters, E.G., III; Schulte, M.J.; Arnold, M.G. “Truncated Squarers with Constant and Variable Correction,” In Proceedings of the SPIE: Advanced Signal Processing Algorithms, Architectures, and Implementations XIV, Denver, CO, USA, 4–6 August 2004; Volume 5559, pp. 40–50.
[12] H. J. Ko and S.F. Hsia, “Design and Application of Faithfully Rounded and Truncated Multipliers with Combined Deletion, Reduction, and Rounding”, IEEE trans. Circuit system II Exp. Briefs, vol. 58, no. 5 pp. 304-308 May 2011.
[13] S. F. Hsiao, H. J. Ko, and C. S. Wen, “Two-level hardware function evaluation based on correction of normalized piecewise difference functions,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 5, pp. 292–296, May 2012.
[14] S. F. Hsiao, C. S. Wen, and P. H. Wu, “Compression of lookup table for piecewise polynomial function evaluation,” in Proc. 17th Euromicro Conf. Digit. Syst. Design (DSD), Aug. 2014, pp. 279–284.
[15] De Dinechin, F. Tisserand, A. “Multipartite Table Methods,” IEEE Trans. Comput. 2005, 54, 319–330.
[16] A.G.M. Strollo, D. De Caro, and N. Petra, “Elementary Functions Hardware Implementation Using Constrained Piecewise-Polynomial Approximations,” IEEE Trans. on Computers, vol.60, no.3, pp.418-432, Mar. 2011.
[17] D-U Lee, “Hierarchical Segmentation for Hardware Function Evaluation” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 17, No. 1, pp. 103-116 , Jan. 2009
[18] S. F. Hsiao, H. J. Ko, Y. L. Tseng, W. L. Huang, S. H. Lin, and C. S. Wen, "Design Of Hardware Function Evaluators Using Low-Overhead Non-uniform Segmentation With Address Remapping," The IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 5, pp. 875-886, May 2013.
[19] S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, and K.-C. Huang, “Hierarchical Multipartite Function Evaluation,” IEEE Transactions on Computers, Early Access Articles, 2016.
[20] A. Mohamed and A. Nadjia and B. Hamid and I. Mohamed, “Reconfigurable architecture for elementary functions evaluation,” 2009 IEEE/ACS International Conference on Computer Systems and Applications, May, 2009.
[21] K. A. C. Bickerstaff, M. Schulte, and E. E. Swartzlander, “Reduced area multipliers,” in Proc. Int. Conf. on Application-Specific Array Processors, 1993, pp. 478–489.
[22] E. J. King and E. Swartzlander, “Data-dependent truncation scheme for parallel multipliers,” in IEEE Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1178–1182, 1997.
[23] D. De Caro, et al., “A 380 MHz Direct Digital Synthesizer/Mixer with Hybrid CORDIC Architecture in 0.25 _m CMOS,” IEEE Journal of Solid-State Circuits (JSSC), vol. 42, no. 1, pp.151-160, Jan. 2007.
[24] D. De Caro, N. Petra, and A. G. M. Strollo, “Digital Synthesizer Mixer ith Hybrid CORDIC–Multiplier Architecture, Error Analysis and Optimization,” IEEE Trans. Circuit sand Systems-I, vol. 56, no. 2, pp. 364-373, Feb. 2009.
[25] D. Fu and A. N. Willson, Jr., “A Two-Stage Angle-Rotation Architecture and Its Error Analysis for Efficient Digital Mixer Implementation,” IEEE Trans on Circuits and Systems-I, vol. 53, no. 3, pp. 604-614, Mar. 2006.
[26] J.A. Pineiro, J.M. Muller, and J.D. Bruguera, “High-Speed Function Approximation Using a Minimax Quadratic Interpolator,” IEEE Trans on Computers, vol. 54, no. 3, pp. 304-318, Mar. 2005.
[27] V.G.Oklobdzija, D.Villeger, and S.S.Liu, “Improving Multiplier Design by Using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology,” IEEE Trans. VLSI Systems,vol.3, no.2,pp.292-301,June 1995.
[28] S. F. Hsiao, P. H. Wu, C. S. Wen, and P. K. Meher, “Table size reduction methods for faithfully rounded lookup-table-based multiplierless function evaluation,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, no. 5, pp. 466–470, May 2015.
電子全文 電子全文(網際網路公開日期:20220730)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔