跳到主要內容

臺灣博碩士論文加值系統

(2600:1f28:365:80b0:3cde:41ad:c1c4:8dfe) 您好!臺灣時間:2024/12/07 08:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:孫志豪
研究生(外文):Chih-Hao Sun
論文名稱:適用於行動多媒體應用之低功率繪圖處理器
論文名稱(外文):Low Power Graphics Processing Units with Programmable Texture Unit and Universal Rasterizer for Mobile Multimedia Applications
指導教授:簡韶逸
指導教授(外文):Shao-Yi Chien
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電子工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:104
中文關鍵詞:繪圖處理器
外文關鍵詞:Graphics HardwareGrphics Processing Units
相關次數:
  • 被引用被引用:1
  • 點閱點閱:268
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來手持式行動裝置快速成長,大量的音樂/視訊播放與影像處裡等多媒體應用被整合進諸如手機,個人數位助理 (PDA) 與手持式多媒體播放器 (PMP)。除此之外,光采炫目的繪圖人機介面 (GUI) 以及三維繪圖遊戲更是被視為促使手持式行動裝置下一波的成長動力來源。在當代的繪圖處理器中,由於可程式化渲染器(Programmable Shader)的演進,顯示卡的可控制性及可程式性也大幅增加。因此,將顯示卡用在非傳統的三維計算機圖形(3D Computer Graphics)方面的應用越來越多,這樣的應用叫做 GPGPU (General-purpose computing on graphics processing units) 。因為如此有越來越多的針對各種標準的音頻/視訊標準的硬體加速器以及繪圖處理器被嵌入在手持式行動裝置平台上。從系統的角度來看,如果能將視訊加速器與繪圖處理器加以整合不但能增加硬體利用率,減少晶片面積降低成本更能減低消耗功率,這對於下ㄧ代多媒體手持式行動裝置是個非常重要的因素。在本論文中提出了三項創新的技術來達到低功率與高硬體效率,分別為通用點陣轉化器 ( Universal Rasterizer),可程式化貼圖濾波單元(Programmable Filter Unit),與階層式貼圖壓縮 (Mipmapping Texture Compression)。
首先,我們提出了通用點陣轉化器。為了降低硬體的複雜度,我們提出了高效率塊狀移動演算法及共用硬體的架構。由最後測試的結果顯示出當我們將所提出的架構整合入三為繪圖系統的應用中時,可以符合即時以及有效的處理需求。
接下來我們提出了可程式化貼圖濾波單元。本可程式單元提供了全新資料串流的通道,可以加速將更多非三維繪圖應用,如視訊壓縮或影像處理,有效率地實現在繪圖處理器上。實做的部份,我們將視訊解碼中佔最重的部份的動態補償及影像分割技術。在這兩個應用中,我們都分別提升了28.4%和60%的整體系統效能。
最後為階層式貼圖壓縮演算法。根據分析結果,此壓縮可以減少百分之八十的頻寬。在將透明度與色彩的部份做結合之後,也能夠提升其貼圖的壓縮品質和效率。
以上三項技術都被整合入低功率三維繪圖處理器中。這個處理器具有多媒體串流處理的特性,並且我們將之實現成一個系統晶片的平台。原型晶片利用聯電90nm技術製成,面積為5×5mm2。其處理速度為每秒200百萬頂點以及400百萬像素以及1600百萬貼圖,等同於每秒11億浮點數運算。
In the current graphics pipeline, programmable vertex, pixel, and geometry shaders provide programmers with increased flexibility for different rendering applications.
Programmable graphics processing unit (GPUs) support not only highquality rendering algorithms but also a large number of general-purpose computations that are mapped into the graphics hardware; such computations are called as general-purpose computations on GPUs (GPGPU). This concept is beneficial, particularly for mobile systems. Owing to the development of advanced GPGPU techniques, we can establish a unified mobile multimedia subsystem by processing different types of contents on GPUs; this can reduce the cost of the entire system because of high hardware utilization and efficiency. However, a mobile
device is by definition powered with batteries and is also small in order to be portable. It is important to make sure that the system of the mobile phone uses as little energy as possible. In this thesis, we presented three units adaptable for mobilenGPUs; there are Universal Rasterizer, Programmable Filtering Unit (PFU), and High-Quality Mipmapping Texture Compression with Alpha Map (MTC).
First, an Universal Rasterizer in tile-scan triangle traversal with edge equations for low complexity is purposed. The related efficient tiled triangle traversal
algorithm is also introduced. The result shows it can minimize the processing time of triangle traversal, and ensure no reiteration when traverse. Besides, the improved hardware architecture realize the efficiency of the traversal and rasterization algorithm. With highly hardware-sharing and the digital signal processing techniques as pipelined and scheduling, it can achieve real-time requirement for graphics application.
Second, Programmable Filtering Unit (PFU), which is a newly developed programmable unit formedia-processing application, implemented on the streamprocessing architecture of GPUs. The PFU is located in the texture unit of a GPU, and it can efficiently execute several types of filtering operations by directly accessing the multi-bank texture cache and specially-designed data-paths. Simulation results show that in comparison to conventional texture units, the processing time required in H.264/AVC motion compensation and video segmentation can be reduced by 28.4% and 60%, respectively, by using the PFU.
Furthermore, we presents a high-quality mipmapping texture compression (MTC) system with alpha map. With our approach, it can reduce 80% to 90% of texture access memory traffic. By inspecting the similarity between alpha channel and luminance channel, the two channels are efficiently encoded together with linear prediction in Differential mode. Besides, Spilt mode may take care of textures which have no strong relationship between alpha channel and luminance channel. Furthermore, a layer overlapping technique is proposed as well to reduce the texture memory bandwidth of MTC. Simulation results on graphics platform show that MTC can provide high image quality, low bandwidth and less cache miss rate for textures.
Integrated with the three purposed units mentioned above, low power graphics processing units for mobile multimedia applications is implemented in this thesis. The prototype chip is fabricated by UMC 90nm technology, and the chip size is 5×5mm2. The designed working frequency is 200MHz, and the worst case power consumption is 26mW. The processing capability of the chip is 200 Mvertices/s
of geometry transform and 400 Mpixels/s and 1.6 Gtexels/s of texture filtering, or 11 GFLOPs with PFU.
Abstract xi
1 Introduction 1
1.1 Basic Concept of Mobile Graphics Processing nits . . . . . . . . 2
1.2 The Trend of Mobile Graphics Processing Units . . . . . . . . . . 4
1.3 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Efficient Triangle Traversal Using Universal Rasterizer . . 7
1.3.2 Programmable Filtering Unit for Mobile Multimedia Applications
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 High-QualityMipmapping Texture Compression with Alpha
Map . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Graphics Processing Units Pipeline and Hardware Overview 11
2.1 Algorithm of Rasterization Graphics Pipeline . . . . . . . . . . . 11
2.1.1 Application Stage . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Geometry Stage . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Rasterization Stage . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Render Stage . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 The Hardware Architecture of Rasterization Graphics Processing
Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Programmable Shader . . . . . . . . . . . . . . . . . . . 20
2.2.2 Input Assembler (IA) . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Setup and Rasterizer (RS) . . . . . . . . . . . . . . . . . 21
2.2.4 Output Merger (OM) . . . . . . . . . . . . . . . . . . . . 22
2.2.5 Texture Sampler (TX) . . . . . . . . . . . . . . . . . . . 22
3 Tile-Scan Triangle Traversal with Edge Equations Using Universal
Rasterizer 23
3.1 Introduction to Tile-Scan Triangle Traversal with Edge Equation . 23
3.2 Edge Equation Setup . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Extend to General Interpolation . . . . . . . . . . . . . . . . . . . 27
3.4 The Efficient Tiled Triangle Traversal Algorithm . . . . . . . . . 28
3.4.1 Inter-Tile Traversal . . . . . . . . . . . . . . . . . . . . . 29
3.4.2 Interior Traversal . . . . . . . . . . . . . . . . . . . . . . 31
3.5 The Architecture of Purposed Universal Rasterizer . . . . . . . . 36
3.6 Evaluation Result . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6.1 Hardware analysis . . . . . . . . . . . . . . . . . . . . . 37
3.6.2 Simulation Result . . . . . . . . . . . . . . . . . . . . . . 38
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Programmable Filtering Unit for Mobile Multimedia Applications 41
4.1 Introduction to Filtering Unit in Graphics Pipeline . . . . . . . . . 41
4.2 Conventional Texture Mapping . . . . . . . . . . . . . . . . . . . 42
4.3 Programmable Filtering Unit . . . . . . . . . . . . . . . . . . . . 45
4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.3 Control Scheme for Applications . . . . . . . . . . . . . . 49
4.4 Texture Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 Footprints Generator . . . . . . . . . . . . . . . . . . . . 53
4.4.2 Caching System . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Hardware Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6 Application Study . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6.1 H.264/AVC Motion Compensation . . . . . . . . . . . . . 56
4.6.2 Video Segmentation . . . . . . . . . . . . . . . . . . . . 57
4.7 Bandwidth Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.8 Further Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 High-QualityMipmapping Texture Compression with Alpha Map 65
5.1 Introduction to Texture Compression . . . . . . . . . . . . . . . . 65
5.2 Background of Texture Compression . . . . . . . . . . . . . . . . 67
5.3 Mipmapping Texture Compression . . . . . . . . . . . . . . . . . 69
5.3.1 Mipmapping . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Hierarchical Approach for Texture Compression . . . . . 69
5.3.3 Texture Access Efficiency for Mipmapping . . . . . . . . 72
5.4 Proposed Mipmapping Texture Compression Techniques with alpha
map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4.1 Compression . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4.2 Decompression . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.3 Layer Overlapping Technique for Mipmapping . . . . . . 78
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 79
5.5.1 Mipmapping Texture Compression . . . . . . . . . . . . . 79
5.5.2 System Simulation . . . . . . . . . . . . . . . . . . . . . 81
5.5.3 VLSI Implementation Results . . . . . . . . . . . . . . . 84
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6 The Implementation of Low PowerGraphics Processing Units forMobile
Multimedia Applications 87
6.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . 88
6.1.1 Graphics Processing Units with Streaming Processing Architecture
. . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.1.2 IntegratedMultimedia System-on-a-Chip using Low Power
Graphics Processing Units . . . . . . . . . . . . . . . . . 89
6.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Chip Layout and Specification . . . . . . . . . . . . . . . . . . . 93
6.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7 Conclusion 97
[1] http://www.eecs.mit.edu/100th/images/Whirlwind-op.ctrl-site.html/.
[2] I. E. Sutherland, Sketchpad, aMan-Machine Graphical Communication System,
Ph.D. thesis, Massachusetts Insititute of Technology, January 1963.
[3] http://www.pong-story.com/.
[4] http://www.nvidia.com/.
[5] Tomas Akenine-M¨oller and Jacob Str¨om, “Graphics for the masses: A hardware
rasterization architecture for mobile phones,” ACM Transactions on
Graphics, vol. 22, no. 3, pp. 801–808, July 2003.
[6] Richard Fromm, Stylianos Perissakis, Neal Cardwell, Christoforos E.
Kozyrakis, Bruce McGaughy, David A. Patterson, Thomas E. Anderson,
and Katherine A. Yelick, “The energy efficiency of IRAM architectures,”
in 24th Annual International Symposium on Computer Arhchitecture, 1997,
pp. 327–337.
[7] John D. Owens, Ujval J. Kapasi, Peter Mattson, Brian Towles, Ben Serebrin,
Scott Rixner, and William J. Dally, “Media processing applications on
the Imagine stream processor,” in Proceedings of the IEEE International
Conference on Computer Design, 2002, pp. 295–302.
[8] B. Khailany, “A programmable 512 gops stream processor for signal, image,
and video processing,” in ISSCC Dig. Tech. Papers. IEEE, February 2007,
pp. 272–273.
[9] E. Catmull, A Subdivision Algorithm for Computer Display of Curved Surfaces,
Ph.D. thesis, University of Utah, December 1974.
[10] B.-T. Phong, “Illumination for computer generated pictures,” Commun.
ACM, vol. 18, no. 6, pp. 311–317, 1975.
[11] Lance William, “Pyramidal parametrics,” in Proceedings of ACM SIGGRAPH,
1983.
[12] John Kessenich, “The OpenGL ES shading language,”
http://www.opengl.org.
[13] W. F. Engel, Ed., Direct3D ShaderX: Vertex and Pixel Shader Tips and
Tricks, Wordware Publishing, Inc., 2002.
[14] David Blythe, “The direct3d 10 system,” ACM Transactions on Graphics
(SIGGRAPH’06), vol. 25, no. 3, pp. 724–734, July 2006.
[15] http://www.gamedev.net/reference/articles/article1820.asp.
[16] M. D. McCool, J. Ang, and A. Ahmad, “Homomorphic factorization of
BRDFs for high-performance rendering,” in Proceedings of the 28th annual
conference on Computer graphics and interactive techniques( SIGGRAPH
’01), New York, NY, USA, 2001, pp. 171–178, ACM Press.
[17] Philippe Beaudoin and Juan Guardado, ”Non-integer Power Function on the
Pixel Shader”, ShaderX, Wordware Inc., second edition, 2002.
[18] Philippe Beaudoin and Juan Guardado, ”Non-Photorealistic Rendering with
Pixel and Vertex Shaders”, ShaderX, Wordware Inc., 2002.
[19] Jiawen Chen, Michael I. Gordon, William Thies, Matthias Zwicker, Kari
Pulli, and Fredo Durand, “A reconfigurable architecture for load-balanced
rendering,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference
on Graphics hardware (HWWS ’05), Aire-la-Ville, Switzerland,
Switzerland, 2005, pp. 71 – 80, Eurographics Association.
[20] Eric Haines, “An introductory tour of interactive rendering,” in Computer
Graphics and Applications. IEEE, 2006, pp. 76–87.
[21] John H. Reif and Sandeep Sen, “An efficient output-sensitive hidden surface
removal algorithm and its parallelization,” in Symposium on Computational
Geometry, 1988, pp. 193–200.
[22] Ziyad S. Hakura and Anoop Gupta, “The design and analysis of cache architecture
for texture mapping,” in Proceedings of 24th International Symposium
of Computer Architecture, 1997, pp. 108–120.
[23] Homan Igehy, Matthew Eldridge, and Kekoa Proudfoot, “Prefetching
in a texture cache architecture,” in Proceedings of the ACM SIGGRAPH/
Eurographics conference on Graphics hardware, 1998.
[24] Jacob Str¨om and T. Akenine-M¨oller, “iPACKMAN: high-quality, lowcomplexity
texture compression for mobile phones,” in Proceedings of the
ACM SIGGRAPH/Eurographics conference on Graphics hardware, 2005,
pp. 63–70.
[25] KoenMeinds and Bart Barenbrug, “Resample hardware for 3D graphics,” in
Proceedings of the ACM SIGGRAPH/Eurographics conference on Graphics
hardware, 2002, pp. 17–26.
[26] Joel McCormack and Robert McNamara, “Tiled polygon traversal using
half-plane edge functions,” in Proceedings of the ACM SIGGRAPH/
EUROGRAPHICS workshop on Graphics hardware. 2000, pp. 15–
21, ACM.
102
[27] Juan Pineda, “A parallel algorithm for polygon rasterization,” SIGGRAPH
Computer Graphics, vol. 22, no. 4, pp. 17–20, 1988.
[28] M. McCool, C. Wales, and K. Moule, “Incremental and hierarchical hilbert
order edge equation polygon rasterization,” in Proceedings of the ACM SIGGRAPH/
EUROGRAPHICS workshop on Graphics hardware. 2001, pp. 65–
72, ACM.
[29] Bo Han and Bingfeng Zhou, “Efficient video decoding on gpus by point
based rendering,” in Proceedings of the ACM SIGGRAPH/Eurographics
conference on Graphics hardware, 2006, pp. 79–86.
[30] Francis Kelly and Anil Kokaram, “Fast image interpolation for motion estimation
using graphics hardware,” in Proceedings of the IS&T/SPIE Electronic
Imaging, 2004, pp. 184–194.
[31] Draft ITU-T Recommendation H.263, Video Coding for Low Bitrate Communication,
ITU, 1995.
[32] Joint Video Team(JVT) of ISO/IEC MPEG and ITU-T VCEG, Draft ITUT
recommendation and final draft international standard of joint video
specification(ITU-T Rec. H.264/ISO/IEC 14496-10 AVC), JVTG050, 2003.
[33] Robert M. Haralick and Linda G. Shapiro, Computer and Robot Vision,
AddisonWesley Longman Publishing Company, 1992.
[34] Ruigang Yang and Greg Welch, “Fast image segmentation and smoothing
using commodity graphics hardware,” Journal of Graphics Tools, vol. 7, no.
4, pp. 91–100, Dec. 2002.
[35] R. Fernando and M. J. Kilgard, The Cg Tutorial, Addison-Wesley, 2003.
[36] You-Ming Tsao, Chih-Hao Sun, Yu-Cheng Lin, Ka-Hang Lok, Chia-Jung
Hsu, Shao-Yi Chien, and Liang-Gee Chen, “A 26mW 6.4 GFLOPS multi103
core stream processor for mobile multimedia applications,” in Proceedings
of Symposium on VLSI Technology and Circuits, 2008.
[37] Shao-Yi Chien, Yu-Wen Huang, Bing-Yu Hsieh, Shyh-Yih Ma, and Liang-
Gee Chen, “Fast video segmentation algorithm with shadow cancellation,
global motion compensation, and adaptive threshold techniques,” IEEE
Transactions on Multimedia, vol. 6, no. 5, pp. 732–748, Oct. 2004.
[38] Andrew C. Beers, Maneesh Agrawala, and Navin Chaddha, “Rendering
from compressed textures,” in Proceedings of ACM SIGGRAPH, 1996.
[39] Edward J.Delp and O. Robert Michell, “Image compression using block
truncation coding,” IEEE Transactions on Communications, vol. 27, no. 9,
pp. 1335–1342, Sept. 1979.
[40] Graham Campbell, Thomas A. DeFanti, Jeff Frederiksen, Stephen A. Joyce,
and Lawrence A. Leske, “Two bit/pixel full color encoding,” in Proceedings
of ACM SIGGRAPH, 1986, vol. 20, pp. 215–223.
[41] Konstantine I. Iourcha, Krishna S. Nayak, and Zhou Hong, “System and
method for fixed-rate block-based image compression with inferred pixel
values,” in US Patent 5,956,431, 1999.
[42] Simon Fenney, “Texture compression using low-frequency signal modulation,”
in Proceedings of the ACM SIGGRAPH/Eurographics conference on
Graphics hardware, 2003, pp. 84–91.
[43] J. Str¨om and T. Akenine-M¨oller, “PACKMAN: texture compression for mobile
phones,” in Sketches Program at SIGGRAPH, 2004.
[44] Jacob Str¨om and Martin Pettersson, “ETC2: texture compression using invalid
combinations,” in Proceedings of the ACM SIGGRAPH/Eurographics
conference on Graphics hardware, 2007, pp. 49–54.
104
[45] Anton V. Pereberin, “Hierarchical approach for texture compression,” in
Proceedings of GraphiCon’99, 1999, pp. 195–199.
[46] Jerzy Stachera and Przemyslaw Rokita, “Hierarchical texture compression,”
in International Conferences in Central Europe on Computer Graphics, Visualization
and Computer Vision, 1997, pp. 108–120.
[47] David S. Taubman and MichaelW. Marcellin, JPEG2000: Image Compression
Fundamentals, Standards, and Practice, Kluwer Academic Publishers,
2002.
[48] J.-H Woo, “A 152mw/195mw multimedia processor with mpeg/h.264/jpeg
and fully programmable 3d graphics for mobile applications,” in Symposium
on VLSI Circuits Dig. Tech. Papers. IEEE, January 2007, pp. 220–221.
[49] B.-G Nam, “A 52.4mw 3d graphics processor with 141mvertices/s vertex
shader and 3 power domains of dynamic voltage and frequency scaling,” in
ISSCC Dig. Tech. Papers. IEEE, February 2007, pp. 278–279.
[50] U. Kapasi, “Programmable stream processors,” Computer, vol. 36, no. 8,
pp. 54–62, August 2003.
[51] I. Buck, “Brook for gpus: stream computing on graphics hardware,” ACM
Transactions on Graphics (SIGGRAPH’04), vol. 23, no. 3, pp. 777–786,
August 2004.
[52] Microsoft Corporation, DirectX 10.0 SDK, Microsoft Corporation, Redmond,
Washington, 2006.
[53] Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong, Euljoo Jeong,
Suk Joong Lee, and Hoi-Jun Yoo, “A 152mW/195mWmultimedia processor
with MPEG/H.264/JPEG and fully programmable 3D graphics for mobile
applications,” in Proceedings of Digest of Technical Papers of the 2007
IEEE International Solid-State Circuits Conference (ISSCC 2007), 2007.
105
[54] Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun
Yoo, “A 52.4mW 3D graphics processor with 141Mvertices/s vertex shader
and 3 power domains of dynamic voltage and frequency scaling,” in Proceedings
of Digest of Technical Papers of the 2007 IEEE International Solid-
State Circuits Conference (ISSCC 2007), 2007.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top