(3.236.228.250) 您好!臺灣時間:2021/04/17 14:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林為森
研究生(外文):Wei-Sen Lin
論文名稱:三維繪圖頂點處理器之整合型算術運算單元設計
論文名稱(外文):Design of Unified Arithmetic Units for 3D Graphics Vertex Shader
指導教授:蕭勝夫
指導教授(外文):Shen-Fu Hsiao
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:99
中文關鍵詞:矩陣運算生產量多階內插近似法頂點處理器
外文關鍵詞:Vertex Shaderhigher-order approximationthroughput of the matrix computation
相關次數:
  • 被引用被引用:1
  • 點閱點閱:230
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:32
  • 收藏至我的研究室書目清單書目收藏:0
頂點處理器為三維電腦繪圖晶片系統核心之一,目的是加速三維繪圖管線中的座標轉換及光源計算,而算術運算單元是頂點處理器的主要硬體。本論文提出整合型算術運算單元的架構,將浮點向量運算單元與浮點特殊函數運算單元整合在一起,目的是為了共用某些硬體單元,進而節省面積。
本論文提出3個架構:提議架構I包含SIMD架構的浮點向量運算單元及以一階內插近似法實現浮點特殊函數運算單元。提議架構II針對提議架構I的面積作改進,以多階內插近似法來減小特殊函數表格的總面積,但需額外的硬體如平方器、3次方器、4次方器。另外為了減少面積,多階內插法所需的內積運算由原本的浮點向量內積運算單元所取代,不過也因此造成向量運算指令的延遲。提議架構III主要針對向量運算指令的延遲與矩陣運算生產量作改進,共有2份浮點向量內積運算單元,其中之一獨立出來以減少向量運算指令的延遲,另一份則為浮點特殊函數運算單元的一部分。而這2份向量內積運算單元又可配合矩陣的運算,藉此提高一倍的矩陣運算生產量。
Vertex shader, one of the core parts in 3D graphics systems, is to speed up the operations of coordinate transformation and lighting in 3D graphics pipeline, and vector ALU is the key part of a vertex shader. This thesis proposes several unified architectures that integrate the floating-point vector arithmetic unit and special function unit in order to share some hardware resource. We propose three different architectures for the design of the unified vector ALU. The first architecture includes a single-instruction-multiple-data (SIMD) vector arithmetic unit, and uses table-based method with first-order approximation to calculate some special functions. The second architecture use higher-order approximation to reduce the table sizes and share the floating-point multipliers in the SIMD vector unit. The proposed third architecture has two copies of hardware that can compute two dot-product operations in parallel and thus increase the throughput of the matrix computation by a factor of two. Furthermore, the two dot-product units can be used to perform the interpolation for special function calculation.
Chapter 1 導論...................................................................................................... 12
1.1 研究動機.................................................................................................. 12
1.2 論文架構.................................................................................................. 12
Chapter 2 頂點處理器所需算術運算單元的介紹.............................................. 13
2.1 3D 圖學簡介 ............................................................................................ 13
2.1.1 三維繪圖管線.................................................................................... 13
2.1.2 幾何轉換子系統介紹........................................................................ 16
2.2 頂點處理器概觀...................................................................................... 25
2.2.1 頂點處理器的基本架構.................................................................... 26
2.2.2 算術運算單元設計對於頂點處理器的重要性及影響.................... 27
2.3 頂點處理器中所需的相關算術運算...................................................... 28
2.3.1 座標轉換............................................................................................ 28
2.3.2 光源計算............................................................................................ 31
Chapter 3 相關論文之探討.................................................................................. 35
3.1 Designs before 2005 (from 1999 – 2004) ................................................ 35
3.2 fixed-point SIMD Vertex Shader[5] [ISSCC’05][JSSC’06] .................... 37
3.3 floating-point SIMD Vertex Shader [6] [ISSCC’05][JSSC’06] ............... 40
3.4 Multi-Thread VLIW Vertex Shader [3] [ISSCC’06] ............................... 42
3.5 LNS-based Vertex Shader [7] [ISSCC’07][JSSC’07] .............................. 45
3.6 Unified Vertex/Pixel Shader..................................................................... 53
3.7 比較與討論.............................................................................................. 54
Chapter 4 整合型算術運算單元的架構設計...................................................... 55
4.1 提議架構I ............................................................................................... 55
4.1.1 架構概觀............................................................................................ 55
4.1.2 指令集................................................................................................ 59
4.1.3 向量運算單元.................................................................................... 60
4.1.4 特殊運算單元.................................................................................... 62
4.2 提議架構 II ............................................................................................. 66
4.2.1 架構概觀............................................................................................ 66
4.2.2 指令集................................................................................................ 67
4.2.3 改進的地方與方法............................................................................ 68
4.3 提議架構 III............................................................................................ 72
4.3.1 架構概觀............................................................................................ 72
4.3.2 指令集................................................................................................ 73
4.3.3 改進的地方與方法............................................................................ 74
4.4 提議架構與相關論文的比較.................................................................. 75
4.4.1 算術運算單元複雜度(Arithmetic Unit Complexity)分析................ 76
4.4.2 矩陣運算生產量分析與比較............................................................ 79
4.4.3 指令延遲分析與比較........................................................................ 81
Chapter 5 實作與驗證.......................................................................................... 85
5.1 合成數據.................................................................................................. 85
5.2 驗證方法與流程...................................................................................... 86
5.3 比較.......................................................................................................... 88
Chapter 6 結論與未來展望.................................................................................. 91
6.1 結論.......................................................................................................... 91
6.2 未來展望.................................................................................................. 91
參考文獻...................................................................................................................... 97
[1]. J. Sohn, et al., “A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Moblie Applications,” IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 192-193, Feb. 2005.
[2]. Donghyun Kim, et al., “An SoC with 1.3Gtexels/s 3D Graphics Full Pipeline Engine for Consumer Applications”, IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 190-191, Feb. 2005.
[3]. Chang-Hyo Yu, et al., “A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications”, IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 408-409, Feb., 2006.
[4]. Byeong-Gyu Nam, et al., “A 52.4mW 3D Graphics Processor with 141Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling”, IEEE International Solid-State Circuits Conference (ISSCC) , Dig. Tech. Papers, pp. 278-603, Feb., 2006.
[5]. Ju-Ho Sohn, et al., “A 155-mW 50-Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, NO. 5, MAY 2006.
[6]. Donghyun Kim, et al., “An SoC With 1.3 Gtexels/s 3-D Graphics Full Pipeline for Consumer Applications”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, NO. 1, JANUARY 2006.
[7]. Byeong-Gyu Nam, et al., “A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 42, NO. 8, AUGUST 2007.
[8]. David Harris, “An Exponentiation Unit for an OpenGL Lighting Engine”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004.
[9]. Byeong-Gyu Nam, et al., “Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems”, IEEE Transactions on Consumer Electronics, VOL. 51, No. 3, AUGUST 2005.
[10]. Nobuhiro Ide, et al., “2.44-GFLOPS 300-MHz Floating-Point Vector-Processing Unit for High-Performance 3-D Graphics Computing”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 35, NO. 7, JULY 2000.
[11]. C.-H. Chen and C.-Y. Lee, “A Cost Effective Lighting Processor for 3D Graphics Application,” International Conference on Image Processing, VOL. 2, pp.792 - 796, 24-28 Oct. 1999.
[12]. Byeong-Gyu Nam, Hyejung Kim, Hoi-Jun Yoo, “Power and Area-Efficient Unified Computation of Vector and Elementary Functions for Handheld 3D Graphics Systems”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 4, APRIL 2008.
[13]. Hyejung Kim, Byeong-Gyu Nam,“A 231-MHz, 2.18-mW 32-bit Logarithmic Arithmetic Unit for Fixed-Point 3-D Graphics System”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, NO. 11, NOVEMBER 2006.
[14]. Atsushi Kunimatsu, Nobuhiro Ide, et al ” VECTOR UNIT ARCHITECTURE FOR EMOTION SYNTHESIS”, IEEE MICRO,VOL. 20, Issue 2, March-April 2000.
[15]. Erik Lindholm, Mark J Kilgard, Henry Moreton, “A User-Programmable Vertex Engine”, ACM SIGGRAPH, pp.149-158 August 2001.
[16]. Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, “A 195mW, 9.1MVertices/s Fully Programmable 3D Graphics Processor for Low Power Mobile Devices”, IEEE Asian Solid-State Circuits Conference, pp. 372 – 375, Nov. 2007.
[17]. Michael J. Schulte, and Earl E. Swartzlander, “Hardware Designs for Exactly Rounded Elementary Functions”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 43, NO. 8, AUGUST 1994.
[18]. http://www.microsoft.com
[19]. http://www.opengl.org
[20]. http://www.khronos.org
[21]. John Kessenich, “OpenGL ES Shading Language”, language version 1.10, 2006.
[22]. Chang-Hyo Yu, Donghyun Kim and Lee-Sup Kim, “A 33.2Mvertices/sec Programmable Geometry Engine for Multimedia Embedded Systems”, IEEE Circuits and Systems(ISCAS), Vol. 5, pp. 4574–4577, May 2005.
[23]. J. M. Muller, “Partially rounded" small-order approximations for accurate, hardware-oriented, table-based methods,” 16th IEEE Symposium on Computer Arithmetic Proceedings, pp. 114 - 121, 15-18 Jun. 2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔