跳到主要內容

臺灣博碩士論文加值系統

(44.222.134.250) 您好!臺灣時間:2024/10/13 08:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃重裕
研究生(外文):Chong-Yu Huang
論文名稱:適用於多視訊編碼標準之低功率具單一核心的二維轉換架構設計與實現
論文名稱(外文):Design and Implementation of Low Power 2-D Transform Architecture with Unique Kernel for Multi-Standard Video Coding Applications
指導教授:賴永康
指導教授(外文):Yeong-Kang Lai
學位類別:碩士
校院名稱:國立中興大學
系所名稱:電機工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:80
中文關鍵詞:離散餘弦轉換整數轉換超大型積體電路多視訊標準
外文關鍵詞:discrete cosine transform (DCT)integer transformVLSImulti-standardMPEG-1/2/4H.264/AVC
相關次數:
  • 被引用被引用:0
  • 點閱點閱:142
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,數位訊號處理對於可攜式電子裝置有著重要的影響力,而對於行動裝置而言,低功率為電路設計之首要課題。目前離散餘弦轉換(DCT)已經被廣泛應用於各類影像以及視訊壓縮標準。然而,目前尚未能滿足當今各類廣泛使用之影像、視訊編碼標準的電路設計,電路的設計不但需要滿足低功率的需求,同樣需要能夠支援各類影像以及視訊編碼標準。因此設計一個適用於多視訊編碼標準之低功率離散餘弦轉換(DCT)電路架構是當前一個值得研究的課題。
在本論文中,我們使用新式分散式算術演算法(NEDA)來實現我們的架構,採用這演算法不需要乘法器和ROM,讓電路可以用簡單的位移器和加法器就可以完成。我們也利用新式分散式算術演算法提出了一個有效的2-D轉換架構,且利用單一核心即可完成傳統DCT的8x8運算以及H.264/AVC 的8x8與4x4整數轉換以支援多視訊編碼標準的應用。此外,我們使用加法樹(adder tree)改善採用分散式算術(DA)演算法所照成的低產出量,因此我們的產出量可以達到每秒400M pixels, 在工作頻率6M Hz、12M Hz 和48M Hz的時候分別可以即時處理HD 720p、1080p和數位電影畫面。而為了降低功率,我們找到一個有效化簡加法數的方法使運算量降低,光對傳統DCT的8x8運算而言就使加法數降低95.8%。根據實現的結果,此架構的功率消耗在時脈50M Hz時是38.7mW。因此本架構具有高處理量和低成本之特性來達到低功率的效果。採用相同的方法,我們也提出了一個支援多視訊編碼標準的IDCT架構。從VLSI實現的觀點來看,我們設計的架構一樣都具有簡單,模組化且規則。
  針對H.264/AVC標準,我們使用相同的演算法,提出了一個高處理量的直接2-D多重轉換架構。這個架構可以執行4種整數轉換,分別是4x4正相轉換、Hadamard 轉換、反相整數轉換和反相Hadamard轉換。根據合成的結果可以跑到時脈100M Hz使處理量達到每秒800M pixels。
In recent years, digital signal processing has significant effects and it is the most important job to design a low power circuit for portable devices. The discrete cosine transform (DCT) has been extensively applied to image and video coding standard. Designing circuit not only requires low power but also supports multi-standard video coding applications in order to meet the requirements of various video coding standards. However, no circuits can meet so far. Therefore, it is worth to research such a topic.
In the thesis, we adopt a new distributed arithmetic algorithm (NEDA) to implement our architecture. There are multiplier-free and ROM-free to make architecture easily to be implemented by some shifts and adders. Therefore, we propose an efficient 2-D transform architecture with unique kernel that can support traditional 8x8 DCT, 8x8 and 4x4 integer transform for multi-standard video coding applications. Furthermore, we utilize adder tree to improve low throughput problem that adopts DA algorithm. Our throughput rate is 400M pixels/s that can process real-time HDTV 720p, 1080p and digital cinema video at 6M Hz, 12M Hz and 48M Hz frequency, respectively. In order to reduce power consumption, we find an efficient approach to simplify number of adder to reduce computation more than 95.8% in terms of traditional DCT. According to experimental results, the power consumption of our proposed architecture is 38.7mW at 50M Hz frequency. Therefore, our proposed architecture has properties of high throughput and low cost to achieve low power effect. In the same way, we also propose IDCT architecture for multi-standard video coding applications. From the viewpoint of the VLSI realization, the proposed architecture is also simple, modular, and regular.
For H.264/AVC standard, we also propose a high throughput direct 2-D multiple transforms using the same algorithm. This architecture can support four transforms that include 4x4 forward integer transform, Hadamard transform, inverse integer transform and inverse Hadamard transform. According to synthesis result, the throughput rate can achieve 800M pixels/s at 100M Hz frequency.
Chapter 1. Introduction 1
1.1 Video Compression Standard 1
1.2 Thesis Organization 3

Chapter 2. Overview of H.264 Video Coding Standard 4
2.1 Video Coding System 4
2.2 Transforms, Quantization and Scan 6
2.2.1 Discrete Cosine Transform 6
2.2.2 Hadamard Transform 9
2.2.3 Quantization and Scan 9
2.3 Variable Block-size Motion Compensation 10
2.4 Multiple Reference Pictures for Motion Compensation 11
2.5 Deblocking Filter 12
2.6 Intra-frame Prediction 13
2.7 Entropy Coding 14

Chapter 3. Architecture Design of 2-D Discrete Cosine Transform 15
3.1 Discrete Cosine Transform Algorithm 15
3.2 Previous Work 22
3.2.1 Row-Column Decomposition 22
3.2.2 2-D Direct Method 24
3.2.3 Systolic Approach 24
3.3 Motivation 25
3.4 Finite Wordlength Analysis 26
3.5 Our Proposed DCT Architecture 31
3.5.1 Adder Kernel Unit 33
3.5.2 Routing Unit and Final Unit 41
3.5.3 Transpose Memory Architecture 41
3.6 Our Proposed IDCT Architecture 43
3.7 Our Proposed Direct 2-D Multiple Transforms Architecture 44
3.7.1 Forward Integer Transform 44
3.7.2 Inverse Integer Transform 46
3.7.3 Hadamard and Inverse Hadamard Transform 46
3.7.4 2-D Multiple Transforms Architecture 47

Chapter 4. Chip Implementation 50
4.1 Chip Specification and Data Sheet 50
4.2 Design Flow, Verification Strategy and Design for Testability 53
4.2.1 Design Flow 54
4.2.2 Verification Strategy 55
4.2.3 Design for Testability 60
4.3 Implementation Result 61
4.3.1 Coding Style Checking 61
4.3.2 Code Coverage Analysis 61
4.3.3 Synthesis Result 62
4.3.4 Layout Result 66
4.3.5 DRC Summary Report 67
4.3.6 LVS Summary Report 68
4.3.7 FPGA Verification 69
4.4 Performance Comparison 74
Chapter 5. Conclusion 76

Reference 77
[1]ITU-T Recommendation H.263: Video coding for low bitrate communication, Mar. 1996.
[2]ITU-T Recommendation H.264: Advanced Video Coding for Generic Audiovisual Service, Mar. 2005.
[3]T. Wiegand, G. J. Sullivan, G. Bjontegard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,”IEEE Trans. on Circuits and System for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003.
[4]A. Puri, X. Chen, and A. Luthra, “Video Coding Using the H.264/MPEG-4 AVC Compression Standard,” IEEE Trans. on Signal Processing: Image Communication, pp. 793-849, 2004.
[5]R. Schafer, T. Wiegand, and H. Schwarz, “The emerging H.264/AVC standard,” EBU Technical Review, Jan. 2003.
[6]G. J. Sullivan, P. topiwala, and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions,” SPIE Conf. on Applicatoins of Digital Image Processing, Aug. 2004.
[7]D. Marpe, and T. W., “H.264/MPEG4-AVC Fidelity Range Extensions: Tools, Profile, Performance, and Application Areas,” IEEE International Conf. Image Processing, vol. 1, pp. 593-596, Sept. 2005.
[8]H. S. Hou, “A Fast Recursive Algorithm For Computing the Discrete Cosine Transform,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. ASSP-35, no. 10, pp. 1455-1461, Oct. 1987.
[9]A. Madisetti, and N. Willson, Jr., “A 100 MHz 2-D 8x8 DCT/IDCT processor for HDTV applications,” IEEE Trans. on Circuits and System for Video Technology, vol. 5, no. 2, pp. 158-165, April 1995.
[10]W. H. Chen, C. H. Smith, and S. C. Fralick, “A Fast Computational Algorithm for the Discrete Cosine Transform,” IEEE Trans. on Communications, vol. COM-25, no. 9, Sept. 1977.
[11]C. Loeffler, A. Ligtenberg, and George S. Moschytz, “Practical Fast 1-D DCT Algorithm with 11 Multiplications,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Process, vol. 2, pp. 988-991, May 1989.
[12]Y. M. Chien, and Y. Lin “A Recursive DCT Algorithm with New Distributed Arithmetic,” IEEE ICASSP Internal. Conf. Comm. Circuits and System Proceeding, vol. 4, pp. 2582-2587, June 2006.
[13]M. T. Sun, T. C. Chen, and A. M. Gottlieb, “VLSI implementation of a 16x16 discrete cosine transform (DCT),” IEEE Trans. on Circuits and System, vol. CAS-36, no. 4, pp. 610-617, April 1989.
[14]S. Uramoto, Y. Inoue, A. Takabatake, J. Takeda, Y. Yamashita, H. Terane, and M. Yoshimoto, “A 100M-Hz 2-D Discrete Cosine Transform Core Processor, “IEEE J. Solid-State Circuits, vol. 27, no. 27, pp. 492-499, April 1992.
[15]W. Pan, “A Fast 2-D DCT Algorithm Via Distributed Arithmetic Optimization,” International Conf. on Image Processing, vol. 3, pp. 114-117, Sept. 2000.
[16]A. M. Shams, A. Chidanandan, W. Pan, and M. A. Bayoumi, “NEDA: A low-power high-performance DCT architecture,” IEEE Trans. Signal Processing, vol. 54, no. 3, pp. 955-964, Mar. 2006.
[17]S. Ghosh, S. Venigalla and M. Bayoumi, “Design and Implementation of a 2D DCT Architecture using Coefficient Distributed Arithmetic,” IEEE Computer Society Annual Symposium on VLSI, pp. 162-166, May 2005.
[18]L. Fanucci and S. Saponara, “Data Driven VLSI Computation for Low Power DCT-Based Video Coding,” in Proc. 9th Int. Conf. Electronics, Circuits, System, pp. 541-544, Sept. 2002.
[19]T. Xanthopoulos, and A. P. Chandrakasan, “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization,” IEEE J. Solid-State Circuits, vol. 35, no. 2, pp. 740-750, May 2000.
[20]J. W. Chen, K. Hung, J. S. Wang, and J. I. Guo, “A Performance Aware IP Core Design for Multi-mode Transform Coding Using Scalable-DA Algorithm,” IEEE ISCAS, pp. 21-24, May 2006.
[21]J. I. Guo, R. C. Ju, and J. W. Chen, “An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization,” IEEE Trans. on Circuit and System For Video Technology, vol. 14, no. 4, pp. 416-428, April 2004.
[22]C. Cheng, and K. K. Parhi, “Hardware Efficient Fast DCT Based on Novel Cyclic Convolution Structures,” IEEE Trans. on Signal Processing, vol. 54, no. 11, pp. 4419-4434, Nov. 2006.
[23]D. Gong, Y. He, and Z. Cao, “New Cost-Effective VLSI Implementation of a 2-D Discrete Cosine Transform (DCT) and Its Inverse,” IEEE Trans. on Circuits and System for Video Technology , vol. 5, no. 14, pp. 405-415, April 2004.
[24]Y. P. Lee, T. H. Chen, L. G. Chen, M. J. Chen, and C. W. Ku, “A Cost-Effective Architecture for 8x8 Two-Dimensional DCT/IDCT Using Direct Method,” IEEE Trans. on Circuits and System for Video Technology, vol. 7, no. 3, pp. 459-466, June 1997.
[25]B. L. Jian, Z. Xuan, T. J. Rong, and L. Yue, “An Efficient VLSI Architecture For 2-D DCT Using Direct Method,” IEEE International Conf. on ASIC Proceeding, pp. 393-396, Oct. 2001
[26]Y. T. Chang, and C. L. Wang, ”New Systolic Array Implementation of the 2-D Discrete Cosine Transform (DCT) and Its Inverse,” IEEE Trans. on Circuits and System for Video Technology , vol. 5, no. 2, pp. 150-157, April 1995.
[27]Y. T. Chang, C. L. Wang, and C. H. Chang, “A New Fast DCT Algorithm and Its Systolic VLSI Implementation,” IEEE Trans. on Circuits and System, vol. 44, no. 11, pp. 959-962, Nov. 1997.
[28]H. Jeong, J. Kim, and W. Cho, “Low-Power Multiplierless DCT Architecture Using Image Correlation,” IEEE Trans. Consumer Electronics. , vol. 50, no. 1, pp. 262-267, Feb. 2004.
[29]Y. H. Hu, and Z. Wu, “An Efficient CORDIC Array Structure for the Implementation of Discrete Cosine Transform,” IEEE Trans. Signal Processing, vol. 43, no. 1, pp. 331-336, Jan. 1995.
[30]J. H. Hsiao, L. G. Chen, T. D. Chiueh, and C. T. Chen, “High Throughput CORDIC-Based Systolic Array Design for the Discrete Cosine Transform,” IEEE Trans. on Circuits and System for Video Technology, vol. 5, no. 3, pp. 218-225, June 1995.
[31]T. Y. Sung, Y. S. Shieh, C. W Yu, and H, C. Hsin, “High-Efficiency and Low Power Architectures for 2-D DCT and IDCT Based on CORDIC Rotation,” IEEE Conf. on PDCAT, pp. 191-196, Dec. 2006.
[32]K. Lengwehasatit, and A. Ortega, “Scalable Variable Complexity Approximate Forward DCT,” IEEE Trans. on Circuits and System for Video Technology, vol. 14, no. 11, pp. 1236-1247, Nov. 2004.
[33]N. J. August, and D. S. Ha, “Low Power Design of DCT and IDCT for Low Bit Rate Video Codecs,” IEEE Trans. on Multimedia, vol. 6, no. 3, pp. 441-422, Jane 2004.
[34]T. Masaki, Y. Morimoto, T. Onoye, and I. Shirakawa, “VLSI Implementation of Inverse Discrete Cosine Transformer and Motion Compensator for MPEG2 HDTV Video Decoding,” IEEE Trans. on Circuits and System for Video Technology, vol. 5, no. 5, pp. 387-395, Oct. 1995.
[35]T. Xanthopoulos, and A. P. Chandrakasan, “A Low-Power IDCT Macrocell for MPEG-2 MP@ML Exploiting Data Distribution Properties for Minimal Activity,” IEEE J. Solid-State Circuits, vol. 34, no. 5, pp. 693-703, May 1999.
[36]J. Lee, N. Vijaykrishnan, and M. J. Irwin, “Efficient VLSI Implementation of Inverse Discrete Cosine Transform,” IEEE International Acoustics, Speech, and Signal Processing, vol. 5, pp. 177-180, May 2004.
[37]A. Navarro, A. Silva, and J. Tavares, “MPEG-4 Codec Performance Using a Fast Integer IDCT,” IEEE Tenth International Symposium Consumer Electronics, pp. 1-5, June 2006.
[38]J. Lee, N. Vijaykrishnan, and M. Jane Irwin, “Inverse Discrete Cosine Transform Architecture Exploiting Sparseness and Symmetry Properties,” IEEE Trans. on Circuits and System for Video Technology, vol. 16, no. 5, pp. 655-662, May 2006.
[39]Z. Y. Cheng, C. H. Chen, B. D. Liu, and J. F Yang, “High Throughput 2-D Transform Architectures for H.264 Advanced Video Coders,” IEEE Asia-Pacific Conf. Circuit and System, vol. 2, pp.1141-1144, Dec. 2004.
[40]C. P. Fan, “Fast 2-Dimensional 4x4 Forward Integer Transform Implementation for H.264/AVC,” IEEE Trans. Circuit and System, vol.53, no. 3, pp. 174-177, Mar. 2006.
[41]H. Qi, W. Gao, S. Ma, and D. Zhao, “Adaptive Block-Size Transform Based on Extended Integer 8x8/4x4 Transforms for H.264/AVC,” IEEE International Conf. on Image Processing, pp. 1341-1344, Oct. 2006.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top