研究生(外文):Yen-Chang Lu
論文名稱(外文):Approximate Rendering Based Low Power GPU for Mobile Devices
指導教授(外文):Shao-Yi Chien
外文關鍵詞:GPUlow power3D renderingmobile devicesSoC
在這篇論文中,我們提出了一個可調整解析度的繪圖流程(Graphic pipeline),利用近似繪圖的方式,減少需要運算的畫素(Pixel)量,進而達到節省電量的目的。而為能確保繪圖品質的減損維持在可接受的範圍內,我們也提出了一個估計運算誤差的誤差函式(Error function),而推導此誤差函式的方法可以適用於各種繪圖場景,甚至是其他更通用性的物理計算;如此一來,三維繪圖處理器便能更有效率地描繪出符合使用者期待品質的畫面,而在能源的控制上也能達到更細緻的程度。
結合以上的技術,我們修改了一個現有的適用於行動式多媒體裝置的低功率三維繪圖處理器,來測試我們提出的繪圖流程,並將之實現成一個系統晶片平台,原型晶片利用台積電65nm 技術製成,面積為4x4 mm2,其工作頻率為200MHz,最大消耗功率為128 mW。

Due to the fast growth of the market for personal mobile devices and the improvement of process technology, more and more multi-media applications are embedded on our handheld devices. Since most of these devices use the screen as its main user interface, high visual quality becomes one of the most important factors when evaluating the user experience. To achieve this requirement, GPU becomes a great computing power for a mobile device. However, the mobile devices may not have enough power due to the nature of sparse resource. For example, the only power supply for mobile phone is the battery. Therefore, it is a big challenge to maintain visual quality under a low battery situation.
In this thesis, we proposed a low power graphics pipeline based on approximate rendering technique with error control scheme. The approximate rendering technique is targeting on reducing the number of rendered pixel and hence to reduce the expensive computation power. The proposed technique can reach a reduction rate of 50% for medium resolution mode and 80% for low resolution mode in most cases. Furthermore, the error control scheme can estimate the error caused by the approximation so the system can render objects adaptively. Consequently the error control scheme, we can render image with fewer pixels but still sustain acceptable quality. Finally, with the proposed techniques above, a multi-core GPU system for mobile devices is implemented. The prototype chip is fabricated in TSMC 65nm technology, and the chip size is 4.0o4.0mm2. The designed working requency is 200MHz, and the worst case power consumption is 128mW.

1 Introduction 1
1.1 Introduction to Graphic Pipeline . . . . . . . . . . . . . . . . . . 3
1.2 Limitations for Embedded GPU on Mobile Device . . . . . . . . 4
1.3 Analysis of The Power Consumption for Graphic Pipeline . . . . 6
1.3.1 Level of Detail . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Lighting Model . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Texture Model . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Motivation and Target . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 PreviousWorks 13
2.1 CORDIC Vector Interpolator . . . . . . . . . . . . . . . . . . . . 15
2.2 Geometry-aware Framebuffer Level of Detail . . . . . . . . . . . 17
2.3 Spatio-Temporal Upsampling on GPU . . . . . . . . . . . . . . . 19
3 Proposed Approximate Rendering Pipeline 23
3.1 Configurable Resolution Sampling . . . . . . . . . . . . . . . . . 24
3.1.1 Structure Overview . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Configurable Resolution Sampling . . . . . . . . . . . . . 25
3.1.3 Repacking Pixels to Sampling Pattern . . . . . . . . . . . 26
3.2 Error Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Taylor Expansion Series . . . . . . . . . . . . . . . . . . 29
3.2.2 Error Function . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 Simple Example of Fast Lighting . . . . . . . . . . . . . 30
3.2.4 Complicated Lighting: Phong Shading . . . . . . . . . . . 32
3.2.5 Computation Overhead . . . . . . . . . . . . . . . . . . . 35
3.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Proposed Adaptive Resolution Rendering Architecture . . . . . . 36
3.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Frame Rate Control Mode . . . . . . . . . . . . . . . . . 39
4 Architecture Design of Proposed Approximate Rendering Graphics
Pipeline 43
4.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Tile-based Architecture . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Repakcing and Sampling Table . . . . . . . . . . . . . . . . . . . 48
4.3.1 Repacking Unit . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.2 Sampling Table . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.3 Upsample Unit . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Error Estimation Unit . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Experimental Results 57
5.1 Benchmark Images and Lighting Effects . . . . . . . . . . . . . . 59
5.2 Pixel Reduction Rate . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Visual Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.1 Error Control . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.2 Comparing to Previous Work . . . . . . . . . . . . . . . . 76
5.3.3 Error Estimation with Different Precision . . . . . . . . . 79
Implementation of Low Power Graphics Processing Units for Mobile
Multimedia Applications 81
6.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 Chip Layout and Specification . . . . . . . . . . . . . . . . . . . 85
6.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7 Conclusion 89
Reference 91

