(3.235.245.219) 您好!臺灣時間:2021/05/07 22:36
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:黃冠潣
研究生(外文):Kuan-min Huang
論文名稱:同時支援浮點和定點格式運算之可程式化頂點處理器設計、實作與驗證
論文名稱(外文):Design, Implementation, And Verification Of A Programmable Floating- And Fixed-Point Vertex Shader
指導教授:蕭勝夫
指導教授(外文):Shen-Fu Hsiao
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:136
中文關鍵詞:幾何運算三維圖學頂點處理器
外文關鍵詞:Vertex ShaderSIMDProgrammable
相關次數:
  • 被引用被引用:3
  • 點閱點閱:239
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
3D 繪圖流程依照功能上大致可分為:幾何轉換子系統(Geometry Subsystem)以及著色子系統(Render Subsystem)。幾何轉換子系統的硬體主要可分成2種方式,一種是固定功能的硬體管線(Fixed Function Pipeline),其架構是採用固定的硬體管線,運算流程固定無彈性;另一種是可程式化的頂點著色引擎(User-Programmable Vertex Shader),可依使用者的需求來產生不同運算結果,運算較為彈性,也逐漸成為目前設計上的主流。而本篇論文依照OpenGL ES 2.0之規格設計一個可程式化幾何轉換子系統。本論文提出以SIMD(Single-Instruction Multiple-Data) 乘累加器(MAC)為基礎,作為可程式化頂點處理器中之主要架構,將浮點、定點向量運算單元與浮點特殊函數運算單元整合於其中,並提供自行定義之指令集,可供使用者選擇定點、浮點,向量或是單純純量運算之處理器架構。此外,本論文針對硬體面積、效能及功率消耗等需求作硬體最佳化之處理。
3D graphics pipeline can be divided into two subsystems: geometry subsystem and rendering subsystem.
Hardware implementation of the transformation and lighting in the geometric subsystem can be divided into two categories, fixed function pipeline and programmable vertex shader. This thesis proposes a programmable vertex shader design based on OpenGL ES 2.0 specification. We start from the design of instruction set and use a multiplier-accumulator (MAC)-based SIMD (Single-Instruction Multiple-Data) structure. The vertex shader supports both floating-point and fixed-point operations of both scalar and vector formats. In addition, the special function unit for calculation of complicated functions is also integrated in the vertex shader. Besides, we also make out best to minimize the cost, power ,and delay during the entire design process.
第1 章 概論 .. 1
1.1 本文大綱. 1
1.2 研究動機. 1
1.3 貢獻... 2
第2 章 研究背景與相關研究 .. 3
2.1 三維圖學管線流程. 3
2.1.1 三維圖學之API 規範 . 3
2.1.2 幾何運算之架構比較:Fixed Function Geometry Engine V.S.
Programable Geometry Engine (Vertex Shader) 7
2.2 Geometry System 所負責之運算 10
2.2.1 座標轉換(Transformation) ..... 10
2.2.2 座標轉換顏色運算(Lighting) 18
2.2.3 Culling & Clipping .... 20
2.3 相關研究論文. 21
2.3.1 Design Before 2005(From 1999 To 2004) ..... 22
2.3.2 Fixed-Point SIMD Vertex Shader ... 24
2.3.3 Floating-Point SIMD Vertex Shader .. 27
2.3.4 Multi-Thread VLIW Vertex Shader ... 29
2.3.5 LNS-Based Vertex Shader ..... 32
2.3.6 Unified Vertex/Pixel Shader ..... 40
2.3.7 Multimedia Processor ... 41
2.3.7 比較與討論. 42
2.4 本論文提出的架構概觀... 44
第3 章 Vertex Shader 概觀 ..... 46
3.1 整體架構簡述. 46
3.2 算術運算單元對於頂點處理器的重要性及影響..... 48
3.3 頂點處理器中所需的相關算術運算. 49
3.3.1 座標轉換(Transformation) .... 49
3.3.2 光源計算(Lighting) ... 51
3.4 Vertex Shader 指令集 . 55
3.4.1 指令集設計.... 55
3.4.2 由指令集來組成Geometry System 所需之數學運算 .. 61
3.5 Vertex Shader 硬體設計 ... 65
3.5.1 Four-Way Floating And Fixed-Point SIMD Vector Unit . 65
3.5.2 Special Function Unit(SFU) .. 72
3.4 溝通介面... 79
第4 章 管線化(Pipeline) .... 83
4.1 Vertex Shader 之管線化設計 . 83
4.2 資料危障與回饋機制(Hazard And Forwarding) . 85
4.2.1 解決資料危障問題:Forwarding ..... 85
4.2.2 timing diagram of Geometry system operations .. 86
4.2.3 排序方法其效率及正確性之分析.... 94
4.3 管線化架構下之效能評估..... 95
第5 章 實作與驗證結果 .... 96
5.1 合成數據結果... 96
5.2 驗證方法與流程..... 99
5.3 比較..... 102
第6 章 結論與未來目標 .. 105
6.1 結論..... 105
6.2 未來目標... 106
參考文獻 108
附錄A(Appendix A) 112
圖目錄
[圖2-1] Overview Of Opengl ES Operation ... 4
[圖2-2] Primitive Assembly .... 5
[圖2-3] Rasterization .. 6
[圖2-4] Opengl ES 1.X 圖形運算固定功能管線流程 .... 7
[圖2-5] Geometry Engine 運算功能分布 8
[圖2-6] Geometry Subsystem Engine 架構圖 8
[圖2-7] Opengl ES 2.0 可程式化3D 圖形運算管線流程 .... 9
[圖2-8] Geometry Operations In Opengl 10
[圖2-9] 座標轉換管線圖..... 11
[圖2-10] 文字經過縮放的效果..11
[圖2-11] 文字經過平移的效果..12
[圖2-12] 文字經過旋轉的效果. 13
[圖2-13] 觀點轉換後的座標系統... 14
[圖2-14] 視埠轉換示意圖... 16
[圖2-15] 光線效果... 19
[圖2-16] 光線運算當中各個向量示意圖... 19
[圖2-17] Clipping Operation. 20
[圖2-18] Culling Operation ... 21
[圖2-19] 向量處理器的架構圖. 23
[圖2-20] VPU1 架構圖 .. 23
[圖2-21] 可程式化頂點處理器概觀..... 24
[圖2-22] 操作模式... 25
[圖2-23] (A)(B)矩陣運算與其生產量,(C)單一乘加器硬體架構圖 . 26
[圖2-24] 指令層級功率管理..... 26
[圖2-25] 幾何轉換引擎架構..... 28
[圖2-26] 特殊函數運算單元架構圖..... 29
[圖2-27] VLIW 架構圖 . 31
[圖2-28] Pre CAche/POST Cache 示意圖 .. 32
[圖2-29] 對數轉換器架構圖..... 34
[圖2-30] 小數部份產生器(FPGEN) ..... 34
[圖2-31] 整合型算術運算單元架構..... 36
[圖2-32] LNS Stage 內部架構圖 ..... 36
[圖2-33] 可規劃性的CPA 樹狀架構圖 ..... 37
[圖2-34] 內積運算相關的硬體資源圖. 38
[圖2-35] 矩陣運算生產量示意圖... 38
[圖2-36] 矩陣運算相關的硬體資源圖. 39
[圖2-37] 圖形處理器架構圖..... 39
[圖2-38] Pixel-Vertex Multithreading..... 40
[圖2-39] Fully Programmable 3-D Graphics Processor . 40
[圖2-40] Stream Processor .... 42
[圖2-41] 整體架構概觀. 44
[圖2-42] 階層式架構概觀... 45
[圖3-1] Vertex Shader 系統概觀 46
[圖3-2] 乘法加總器硬體架構... 51
[圖3-3] 光源計算流程... 51
[圖3-4] Negative/Swizzle ..... 56
[圖3-5] Write Mask ... 57
[圖3-6] 矩陣乘法..... 61
[圖3-7] Geometry System 所需運算之流程 63
[圖3-8] Vertex Shader 4-Way SIMD Vector Unit ..... 65
[圖3-9] Swizzle/negative 66
[圖3-10]浮點數加法器... 67
[圖3-11]浮點數乘法器 ... 67
[圖3-12]Vertex Shader 4-way SIMD Datapath and Special Function Unit .. 72
[圖3-13]近似法的階數、表格大小、絕對誤差三者之間的關係圖... 73
[圖3-14] N 階內插近似法硬體單元 74
[圖3-15]Special Function Unit Architecture . 78
[圖3-16] Interface 探討區塊 79
[圖3-17] Vertex Shader System Flow .... 82
[圖4-1] 管線Timing Diagram ... 83
[圖4-2] 硬體管線..... 84
[圖4-3] Timing diagram of transdormation I 86
[圖4-4] Timing diagram of transdormation II ..... 87
[圖4-5] Timing diagram of transdormation III ... 87
[圖4-6] Timing diagram of transdormation IV ... 88
[圖4-7] Timing diagram of transdormation V .... 88
[圖4-8] Timing diagram of transdormation VI ... 89
[圖4-9] Timing diagram of lighting I 89
[圖4-10] Timing diagram of lighting II .. 90
[圖4-11] Timing diagram of lighting III . 90
[圖4-12] Timing diagram of lighting IV . 91
[圖4-13] Timing diagram of lighting V .. 91
[圖4-14] Timing diagram of lighting VI . 92
[圖4-15] Timing diagram of lighting VII 92
[圖4-16] Timing diagram of lighting VIII .... 93
[圖4-17] Timing diagram of lighting IX . 93
[圖4-18] Timing diagram of lighting X .. 94
[圖4-19] Timing diagram of lighting XI . 94
[圖5-1] Vertex Shader System 各部份所占面積之百分比示意圖 . 96
[圖5-2] Vertex Shader Datapath 各算術邏輯單元所占之比例 . 97
[圖5-3] 算術運算單元Functuin & Gate level 驗證流程圖 . 99
表目錄
[表2-1] 各種Transformation 矩陣的整理 .. 17
[表2-2] Clipping.. 20
[表2-3] 相關論文綜合特色表... 43
[表3-1] 算術運算單元之分析..... 54
[表3-2] Instruction Format Description .. 58
[表3-3] Instruction Set .... 59
[表4-1] 管線化下的效能 . 95
[表5-1] Vertex Shader System 合成數據與各單元面積比例 ... 96
[表5-2] Vertex Shader Datapath 合成數據分析 . 98
[表5-3] 特殊函數指令的相對誤差表... 101
[表5-4] 相關論文之功能性比較..... 102
[表5-5] 數據比較表. 104
[表6-1] Vertex Specification and Characteristics ... 106
[1]. J.-H. Sohn, et al., "A 50Mvertices/S Graphics Processor With Fixed-Point Programmable Vertex Shader For Moblie Applications", IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 192-193, Feb. 2005.
[2]. D. Kim, et al., "An Soc With 1.3Gtexels/S 3D Graphics Full Pipeline Engine For Consumer Applications", IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 190-191, Feb. 2005.
[3]. C.-H. Yu, K. Chung, D. Kim, and L.-S Kim, "A 120Mvertices/S Multi-Threaded VLIW Vertex Processor For Mobile Multimedia Applications", IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 408-409, Feb., 2006.
[4]. B.-G Nam, J. Lee, S. J. Lee, and H.-J Yoo, "A 52.4mw 3D Graphics Processor With 141Mvertices/S Vertex Shader And 3 Power Domains Of Dynamic Voltage And Frequency Scaling", IEEE International Solid-State Circuits Conference (ISSCC) , Dig. Tech. Papers, pp. 278-603, Feb., 2007.
[5]. J. Sohn, et al., "A 155-Mw 50-Mvertices/S Graphics Processor With Fixed-Point Programmable Vertex Shader For Mobile Applications", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, pp. 1081-1091, NO. 5, MAY 2006.
[6]. D. Kim, et al., "An Soc With 1.3 Gtexels/S 3-D Graphics Full Pipeline For Consumer Applications", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, pp. 71-84, NO. 1, JANUARY 2006.
[7]. B.-G Nam, H. Kim, and H-J Yoo, "A Low-Power Unified Arithmetic Unit For Programmable Handheld 3-D Graphics Systems", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 42, pp. 1767-1778, NO. 8, AUGUST 2007.
[8]. D. Harris, "An Exponentiation Unit For An Opengl Lighting Engine", IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, pp. 254-254, NO. 3, MARCH 2004.
[9]. B.-G. Nam, M.-W. Lee, and H.-J. Yoo, "Development Of A 3-D Graphics Rendering Engine With Lighting Acceleration For Handheld Multimedia Systems," IEEE Transactions On Consumer Electronics, VOL. 51, pp. 1020-1027, No. 3, AUGUST 2005.
[10]. N. Ide, et al., "2.44-GFLOPS 300-Mhz Floating-Point Vector-Processing Unit For High-Performance 3-D Graphics Computing", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 35, pp. 1025-1033, NO. 7, JULY 2000.
[11]. C.-H. Chen, C.-Y. Lee., "A Cost Effective Lighting Processor For 3D Graphics Application," International Conference On Image Processing, VOL. 2, pp.792-796, NO.8, Oct. 1999.
[12]. B.-G Nam, H. Kim, and H.-J. Yoo, "Power And Area-Efficient Unified Computation Of Vector And Elementary Functions For Handheld 3D Graphics Systems," IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, pp.490-504, NO. 4, APRIL 2008.
[13]. H. Kim, et al., "A 231-Mhz, 2.18-Mw 32-Bit Logarithmic Arithmetic Unit For Fixed-Point 3-D Graphics System", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, pp. 2373-2381, NO. 11, NOVEMBER 2006.
[14]. A. Kunimatsu, et al., "VECTOR UNIT ARCHITECTURE FOR EMOTION SYNTHESIS", IEEE MICRO, VOL. 20, pp. 40-47, Issue 2, March-April 2000.
[15]. E. Lindholm, M. J. Kligard, and H. Moreton, "A User-Programmable Vertex Engine", ACM SIGGRAPH, pp. 149-158 August 2001.
[16]. J.-H Woo, et al., "A 195mw, 9.1mvertices/S Fully Programmable 3D Graphics Processor For Low Power Mobile Devices", IEEE Asian Solid-State Circuits Conference (ASSCC), pp. 372 - 375, Nov. 2007.
[17]. M. J. Schulte, E. E. Swartzlander, "Hardware Designs For Exactly Rounded Elementary Functions", IEEE TRANSACTIONS ON COMPUTERS, VOL. 43, pp. 964-973, NO. 8, AUGUST 1994.
[18]. Http://www.Microsoft.Com
[19]. Http://www.Opengl.Org
[20]. Http://www.Khronos.Org
[21]. J. Kessenich, "Opengl ES Shading Language", Language Version 1.10, 2006.
[22]. C.-H. Yu, D. Kim, and L.-S. Kim, "A 33.2Mvertices/Sec Programmable Geometry Engine For Multimedia Embedded Systems", IEEE Circuits And Systems (ISCAS), VOL. 5, pp. 4574-4577, May 2005.
[23]. J. M. Muller, "Partially Rounded" Small-Order Approximations For Accurate, Hardware-Oriented, Table-Based Methods," 16th IEEE Symposium On Computer Arithmetic Proceedings, pp. 114 - 121, 15-18 Jun. 2003.
[24].C.-H. Yu, et al., "An Energy-Efficient Mobile Vertex Processor With Multithread Expanded VLIW Architecture And Vertex Caches", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 42, pp. 2257-2269, NO. 10, OCTOBER 2007
[25]. J.-H. Woo, et al., "A 195Mw, 9.1mvertices/S Fully Programmable 3-D Graphics Processor For Low-Power Mobile Devices", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL.43, pp. 220-221, NO. 11, NOVEMBER 2008
[26]. J.-H. Woo, et al., "A 195Mw Mobile Multimedia SoC With Fully Programmable 3-D Graphics And MPEG4/H.264/JPEG", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL.43, pp.220-221, NO. 9, NOVEMBER 2008
[27]. S.-Y. Chien, et al., "An 8.6mw 25Mvertices/S 400-MFLOPS 800-MOPS Multimedia Stream Processor Core For 8.91mm2 Mobile Applications", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL.43, pp. 2025-2035, NO. 9, NOVEMBER 2008
[28]. Y. Tsao, C.-H. Chang, Y.-C. Lin, S.-Y. Chien, and L.-G Chen, "An 8.6Mw 12.5Mvertices/S 800MOPS 8.91mm2 Stream Processor Core For Mobile Graphics And Video Applications", IEEE Symposium On VLSI Circuits Digest Of Technical Papers, pp. 218-219, June. 2007
[30]. A. Munshi, "Opengl ES Common/Common-Lite Profile Specification", Ver. 1.1, Nov. 2004.
[31]. T.-Y. Huang, "Hardware Design, Integration, And Verification Of Geometry Engine In 3D Graphics", National Sun-Yet San University, July 2006.
[32]. J. Kessenich, "Opengl ES Shading Language", Language Version 1.10, 2006.
[33]. M. D. Ercegovac, T. Lang, "Digital Arithmetic," Morgan Kaufmann Publishers, pp. 182 - 237, 2004
[34]. J. Cao, B. W. Y. Wei, "High-performance hardware for function generation", 13th IEEE Symposium on Computer Arithmetic Proceedings, pp. 184 - 186, 6 - 9 Jul. 1997
[35]. J. Cao, et al., "High-performance architectures for elementary function", 13th IEEE Symposium on Computer Arithmetic Proceedings, pp. 136 - 144, 11 - 13 Jun. 2001
[36]. M. J. Schulte, J. E. Stine, "Approximating elementary functions with symmetric bipartite tables", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), vol. 48, pp. 842- 847, 8 Aug. 1999
[37]. F. Dinechin, A. Tisserand, "Some improvements on multipartite table methods," 15th IEEE Symposium on Computer Arithmetic Proceedings, pp. 128 - 135, 11 - 13 Jun. 2001
[38]. W.-S. Lin, "Design of Unified Arithmetic Units for 3D Graphics Vertex Shader", National Sun-Yet San University, July 2008.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔