跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2025/01/21 10:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳東杰
研究生(外文):Tung-Chien Chen
論文名稱:可支援720掃描線高解析數位視訊之H.264/AVC標準弁鉠s碼器
論文名稱(外文):Design and Implementation of H.264/MPEG-4 AVC Encoder for SDTV/HDTV Application
指導教授:陳良基陳良基引用關係
指導教授(外文):Liang-Gee Chen
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電子工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:109
中文關鍵詞:編碼器標準弁積體電路影像壓縮
外文關鍵詞:VLSIJVTstandardh.264videocompressionAVC
相關次數:
  • 被引用被引用:0
  • 點閱點閱:168
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
H.264/MPEG-4 AVC 是最新一代的壓縮標準,它提供了近50%壓縮效能的改進,伴隨而來的便是複雜度大量的上升。在本論文中,提出了一個實現H.264視訊編碼系統的四階層巨集區塊管線化系統架構,以及各個階層中核心運算的硬體架構。首先我們透過軟體C模型對H.264/MPEG-4視訊編碼的演算法進行分析,將硬體實現上的問題點出,並再以Verilog C模型驗證我們的系統架構。四階層巨集區塊管線化方法,配合以硬體為導向的演算法及排程,能避開編碼迴圈及演算法相依性的限制。同時根據演算法套用不同的平行度和架構於各個階層的核心運算,達到高度的硬體使用率,有效的將整個視訊編碼的演算法轉換到硬體上。此外,藉由內部記憶體的分享和內部資料傳輸,將頻寬的需求由Tera等級降至223 Mbyte/sec。再藉由調整Lagrangian mode decision使得壓縮品質更進一步的提升。

在模組架構設計方面,為了資源H.264的釵h新的壓縮工具,提出了釵h新的硬體架構及排程。整數點移動估計中的8套128運算單元的Parallel SAD Tree架構及蛇行掃描流程;小數點移動估計中Lagrangian外部模式選擇迴圈演算法的分解及4x4運算單元和線上內插法的架構;內部估計所用的四平行度之可重組內部預測運算單元,及4x4區塊16x16區塊交錯式排程和部分失真中斷的演算法;材質轉換模組中的多弁鄍i重組式轉換器;位元編碼模組的4 x4 管線排程CAVLC編碼器;以及內建可重組的轉置暫存器之去區塊濾波器。整合了這些關鍵性的模組,配合所提出的編碼系統,即可及時壓縮H.264 Baseline Profile。當運作頻率為81 MHz時,可支援SDTV每秒30張四張參考圖框;當運作頻率為108 MHz時,可支援HDTV每秒30張一張參考圖框。

最後本論文利用UMC 0.18 μm 1P6M製程技術實做H.264編碼晶片。根據合成與佈局繞線結果,這顆原型晶片輯閘總數為969K,大小為9.92x4.93mm2 ,最大的操作頻率可達120MHz。當操作於120MHz,1.8伏特時,必v的消耗為634.9mW。
The new video coding standard, H.264/AVC, developed by Joint Video Team (JVT) significantly outperforms previous standards in compression due to the new features including motion estimation (ME) with variable block sizes and multiple reference frames, intra prediction, context-based adaptive variable length coding (CAVLC), context-based adaptive binary arithmetic coding (CABAC), in-loop deblocking filter and more. Compared with MPEG-4, H.263, and MPEG-2, H.264/AVC can achieve 39%, 49%, and 64% of bit-rate reduction, respectively. The huge computational complexity is the penalty. Up to 3.6 Tera-instructions per second of computational complexity and 5.6 Tera-bytes per second of memory access are required for baseline profile level 3.1 (one reference frame and H+-64/V+-32 full search). It is obvious that hardware acceleration is a must for real-time applications. However, the reference software adopts sequential processing of each block in the macroblock (MB) and creates data dependencies that are harmful for parallel processing and MB pipelining. The video coding system with traditional two-stage MB pipelines, prediction (ME) and block engine (BE=MC+DCT+Q+IQ+IDCT+VLC), cannot be applied to H.264/MPEG-4 AVC efficiently because of the much more complex prediction procedures and the reconstruction loop that should not be separated with prediction.

In this thesis, the first H.264/MPEG-4 AVC VLSI encoding system is proposed. According to our analysis, five major functions, integer motion estimation (IME), fractional motion estimation (FME), intra prediction (INTRA), entropy coding (EC), and deblocking (DB) are mapped into four MB pipeline stages with hardware-oriented algorithms and sophisticated scheduling to enable parallel processing and MB pipelining. The bandwidth requirement is reduced by utilizing shared memories and local data transmission. The improved Lagrangian multiplier can enhance the compressed video quality by up to 1.2 dB at high bitrates for large frame size with large motion compared with reference software. To support the new features of H.264/MPEG-4 AVC in each MB pipeline stage, several new architectures are proposed. In IME stage, parallel array of eight 128-PE SAD trees are designed with snake scan data flow to achieve 100% of processing element (PE) utilization and low on-chip SRAM bandwidth. Reuse of overlapped search area can save 87.5% of off-chip bandwidth. In FME stage, we analyze the Lagrangian inter-mode decision loops and provide decomposing methodologies to obtain the optimized projection in hardware implementation. The proposed architecture providing 36 times of parallelism per reference frame is characterized by regular flow and high utilization. In INTRA stage, architectures of reconfigurable intra predictor generator and parallel multi-transform engine are applied. Besides, interleaved schedule and proposed partial distortion elimination (PDE) scheme are used to meet the real-time constraint with only four times of parallelism. In DB stage, interleaved memory organization and an 8x4-pixel array with reconfigurable data path are used to support the 2-D filter with only one parallel-in parallel-out reconfigurable 1-D filter. Finally, highly utilized CAVLC engine is realized by dual-scan buffers for 4x4-block level pipelining in EC stage. Besides, 96-bits packer is proposed to support conversion from raw byte sequence payload (RBSP) to encapsulated byte sequence payload (EBSP).

A prototype chip is implemented by using Artisan 0.18um standard CMOS cell library with UMC 0.18um 1P6M technology. The total gate count is about 970K synthesized at 120 MHz. It can support H.264/MPEG-4 AVC encoding in baseline profile level 3.0 with four reference frames under 81 MHz of operation frequency and level 3.1 with one reference frame under 108 MHz of operation frequency. The maximum processing capability is 108K MB''s per second or namely HDTV 720p (1280x720) 4:2:0 30Hz video. Totally 34.72 Kbytes on-chip memory and 3.11 Mbytes off-chip memory are required. The core size is 7.68x4.13 mm^2. The average power dissipation is 635 mW when it operates at 120 MHz under 1.8 V power supply.
Abstract xiii
1 INTRODUCTION 1
1.1 Video Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 H.264/MPEG-4 AVC Standard Overview . . . . . . . . . . . . . . . . 2
1.3 H.264/MPEG-4 AVC Profiles and Levels . . . . . . . . . . . . . . . . 6
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 OVERVIEW OF H.264/MPEG-4 AVC 11
2.1 Temporal Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Multiple Reference Frames . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Variable Block Sizes . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Motion Vector Predictor . . . . . . . . . . . . . . . . . . . . . 14
2.2 Lagrangian Mode Decision . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Spatial Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Transform and Quantization . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 In-Loop Deblocking Filter . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Deblocking Flow . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5.2 Boundary Strength . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.3 Filter Decision . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 Basic Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.2 Exp-Golomb Code . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.3 Context-Based Adaptive Variable Length Coding . . . . . . . . 25
i
ii
3 SYSTEM DESIGN OFMACROBLOCKPIPELINING FOR H.264/MPEG-
4 AVC 29
3.1 Design Space Exploration and Analysis . . . . . . . . . . . . . . . . . 29
3.1.1 System Profiling . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Encoding Loop and Data Dependency . . . . . . . . . . . . . . 31
3.2 Traditional Platform-Based Video Coding System . . . . . . . . . . . . 35
3.3 MB Pipelining Scheme in H.264/MPEG-4 AVC . . . . . . . . . . . . . 36
3.3.1 Coding Process Decomposition and Pipelining . . . . . . . . . 36
3.3.2 Integration of Other Functions . . . . . . . . . . . . . . . . . . 37
3.3.3 MB Pipelining Schedule . . . . . . . . . . . . . . . . . . . . . 38
3.3.4 Analysis of Required Parallelism . . . . . . . . . . . . . . . . . 39
3.3.5 Bandwidth Consideration . . . . . . . . . . . . . . . . . . . . 41
3.3.6 Special Issue in Each Stage . . . . . . . . . . . . . . . . . . . 44
3.4 Encoding System of H.264/MPEG-4 AVC . . . . . . . . . . . . . . . . 47
3.4.1 H.264/MPEG-4 AVC Encoding System . . . . . . . . . . . . . 47
3.4.2 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.3 Control Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.4 Off-chip Memory Organization . . . . . . . . . . . . . . . . . 52
4 ARCHITECURE DESIGN FOR H.264/MPEG-4 AVC 55
4.1 Integer Motion Estimation Stage . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 ME Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.2 Architecture Design . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.3 Core Engine with Snake Scan Flow . . . . . . . . . . . . . . . 59
4.1.4 Memory Organization . . . . . . . . . . . . . . . . . . . . . . 59
4.1.5 Block Diagram of Integer Motion Estimation . . . . . . . . . . 62
4.2 Fractional Motion Estimation with Lagrangian Mode Decision Stage . . 63
4.2.1 Procedure decomposition and analysis . . . . . . . . . . . . . . 64
4.2.2 Architecture Design of FME Module . . . . . . . . . . . . . . 66
4.3 Intra Prediction and Reconstruction Stage . . . . . . . . . . . . . . . . 69
4.3.1 Reconfigurable PE for Predictor Generation . . . . . . . . . . . 70
4.3.2 Interleaved Scheduling and PDE Scheme . . . . . . . . . . . . 70
iii
4.3.3 Multi-Transform Engine . . . . . . . . . . . . . . . . . . . . . 72
4.4 Entropy Coding Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.1 Exp-Golomb Coding/CAVLC Core with Block Pipelining Scheme 76
4.4.2 96-Bits Packer . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4.3 Bitstream Buffer . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Deblocking Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5.1 Basic Architecture with 2 Single Port SRAM’s . . . . . . . . . 80
4.5.2 Advanced Architecture with 1 Dual Port SRAM . . . . . . . . 83
5 IMPLEMENTATION 87
5.1 System Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 Test Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4 Implementation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Conclusion 103
[1] Joint Video Team, Draft ITU-T Recommendation and Final Draft International
Standard of Joint Video Specification, ITU-T Rec. H.264 and ISO/IEC 14496-10
AVC, May 2003.
[2] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the
H.264/AVC video coding standard,” IEEE Transactions on Circuits and Systems
for Video Technology, vol. 13, no. 7, pp. 560–576, July 2003.
[3] Information Technology - Coding of Audio-Visual Objects - Part 2: Visual,
ISO/IEC 14496-2, 1999.
[4] Video Coding for Low Bit Rate Communication, ITU-T Rec. H.263, 1998.
[5] Information Technology - Generic Coding of Moving Pictures and Associated Audio
Information: Video, ISO/IEC 13818-2 and ITU-T Rec. H.262, 1996.
[6] A. Joch, F. Kossentini, H. Schwarz, T.Wiegand, and G. J. Sullivan, “Performance
comparison of video coding standards using lagragian coder control,” in Proc. of
IEEE International Conference on Image Processing, 2002.
[7] Joint Video Team Reference Software JM7.3,
http://bs.hhi.de/ suehring/tml/download/, Aug. 2003.
[8] Coding of moving pictures and associated audio for digital storage media at up to
about 1.5 Mbit/s – Part2: Video, ISO/IEC 11172, 1993.
[9] Video cidec for audiovisual services at px64 kbits/s, ITU-T Rec. H.261 v1, 1990.
[10] T. Wedi, “Motion compensation in H.264/AVC,” IEEE Transactions on Circuits
and Systems for Video Technology, vol. 13, pp. 577–586, July 2003.
105
106
[11] T.Wiegand and B. Girod, Multi-Frame Motion-Compensated Prediction for Video
Transmission, 2002.
[12] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion-compensated
prediction,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 9, pp. 70–84, Feb. 1999.
[13] M. Flierl and B. Girod, “Generalized B pictures and the draft JVT/H.264 video
compression standard,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 13, pp. 587–597, July 2003.
[14] M. Flierl, T. Wiegand, and B. Girod, “A locally optimal design algorithm for
block-based multi-hypothesis motion-compensated prediction,” in Proc. of Data
Compression Conf., 1998, pp. 239–248.
[15] D. Marpe, H. Schwarz, and T.Wiegand, “Context-based adaptive binary arithmetic
coding in the H.264/AVC video compression standard,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620–644, July 2003.
[16] H. S. Malvar, A. Hallapuro, M. Karczewicz, and Louis Kerosfsky, “Lowcomplexity
transform and quantization in H.264/AVC,” IEEE Transactions on
Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 598–603, July 2003.
[17] P. List amd A. Joch, J. Lainema, G. Bjontegaard, and M. Karczewicz, “Adaptive
deblocking filter,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 13, no. 7, pp. 598–603, July 2003.
[18] S. Wenger, “H.264/AVC over ip,” IEEE Transactions on Circuits and Systems for
Video Technology, vol. 13, pp. 645–656, July 2003.
[19] T. Stockhammer, M. M. Hannuksela, and T. Wiegand, “H.264/AVC in wireless
environments,” IEEE Transactions on Circuits and Systems for Video Technology,
vol. 13, pp. 657–673, July 2003.
[20] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,”
IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74–90, Nov. 1998.
107
[21] T. Wiegand and B. Girod, “Lagrangian multiplier selection in hybrid video coder
control,” in Proc. of IEEE International Conference on Image Processing, 2001.
[22] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rateconstrained
coder control and comparison of video coding standards,” IEEE Transactions
on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 688–703,
July 2003.
[23] T. C Chen, Y. W. Huang, and L. G. Chen, “Analysis and design of macroblock
pipelining for H.264/AVC VLSI architecture,” in Proc. of ISCAS, May 2004.
[24] F. Catthoor, F. Franssen, S.Wuytack, L.Nachergaele, and H. D. Man, “Global communication
and memory optimizing transformations for low power signal processing
systems,” Workshop on VLSI Signal Processing, pp. 178–187, 1994.
[25] S. Wuytack, F. Catthoor, L. Nachtergaele, and H. D. Man, “Power ecploration for
data dominated video application,” in Proc. of IEEE Symposium on Low Power
Electronics and Designs, 1996, pp. 359–364.
[26] E. D. Greef, F. Catthoor, and H. D. Man, “Program transformation strategies
for memory size and power reduction of pseudoregular multimedia subsystems,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, pp. 719–
733, 1998.
[27] S. Wuytack, J. P. Diguet, F. V. M. Catthoor, and H. J. D. Man, “Formalized
methodology for data reuse exploration for low-power hierarchical memory mappings,”
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.
6, pp. 529–537, 1998.
[28] E. Brockmeyer, F. M. M. Catthoor L. Nachtergaele, J. Bormans, and H. J. D.
Man, “Low power memory storage and transfer organization for MPEG-4 full pel
motion estimation on multimedia processor,” IEEE Transactions on Multimedia,
vol. 1, pp. 202–216, 1999.
[29] T. C. Wang, Y. W. Huang H. C. Fang, and L. G. Chen, “Performance analysis of
hardware oriented algorithm modifications in H.264,” in Proc. of ICASSP, 2003.
108
[30] T. Koga, K. Linuma, A. Hirano, and T. Ishiguro, “Motion compensated interframe
coding for video conferencing,” in Proc. of NTC, Nov. 1981, pp. C9.6.1–9.6.5.
[31] S. Zhu and K. K. Ma, “A new diamond search algorithm for fast block matching
motion estimation,” in Information, Communications and Signal Processing,
1997, pp. 9–12.
[32] J. H. Lee, K. W. Lim, B. C. Song, and J. B. Ra, “A fast multi-resolution block
matching algorithm and its LSI architecture for low bit-rate video coding,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 11, pp. 1289–
1301, Dec. 2001.
[33] A. Zaccarin and B. Liu, “Fast algorithms for block motion estimation,” in Proc.
of ICASSP, Mar. 1992, pp. 23–26.
[34] Z. L. He, C. Y. Tsui, K. K. Chan, and M. L. Liou, “Low-power VLSI design for
motion estimation using adaptive pixel truncation,” IEEE Transactions on Circuits
and Systems for Video Technology, vol. 10, pp. 669–678, Aug. 2000.
[35] Z. L. He, K. K. Chen, C. Y. Tsui, and M.L. Liou, “Low power motion estimation
design using adaptive pixel truncation,” in Proc. of International Symposium on
Low Power Electronics and Design, Aug. 1997, pp. 167–172.
[36] Y. W. Huang, T. C. Wang, B. Y. Hsieh, and L. G. Chen, “Hardware architecture
design for variable block size motion estimation estimation in MPEG-4
AVC/JVT/ITU-T H.264,” in Proc. of ISCAS, 2003.
[37] M. Y. Hsu, H. C. Chang, and L. G. Chen, “Scalable module-based architecture for
MPEG-4 BMA motion estimation,” in Proc. of ISCAS, 2001, pp. 245–248.
[38] T. C Chen, Y.W. Huang, and L. G. Chen, “Fully utilized and reusable architecture
for fractional motion estimation of H.264/AVC,” in Proc. of ICASSP, May 2004.
[39] T. C. Wang, Y. W. Huang H. C. Fang, and L. G. Chen, “Parallel 4x4 2D transform
and inverse transform architecture for MPEG-4 AVC/H.264,” in Proc. of IEEE
International Symposium on Circuits and Systems, 2003.
109
[40] T. C. Chen Y. W. Huang, B. Y. Hsieh and L. G. Chen, “Hardware architecture
design for H.264/AVC intra frame coder,” in Proc. of ISCAS, May 2004.
[41] Y.W. Huang, B. Y. Hsieh, T. C. Chen, , and L. G. Chen, “Analysis, fast algorithm,
and vlsi architecture design for H.264/AVC intra frame coder,” IEEE Transactions
on Circuits and Systems for Video Technology, 2004.
[42] S. Eckart and C fogg, SPIE Digital Video Compression: Algorithm and Technologies,
ISO/IEC MPEG-2 Software Video Codec, 1995.
[43] J. N. Kim and T. S. Choi, “A fast motion estimation for software based realtime
video coding,” IEEE Transactions on Consumer Electronics, vol. 45, pp. 417–426,
May 1999.
[44] S. M. Lei and M. T. Sun, “An entropy coding system for digital HDTV applications,”
IEEE Transactions on Circuits and Systems for Video Technology, vol. 1,
no. 1, pp. 147–155, Mar. 1991.
[45] Y. W. Huang, T. C. Wang, B. Y. Hsieh, T. C. Wang, T. H. Chang, and L. G. Chen,
“Architecture design for deblocking filter in H.264/JVT/AVC,” in Proc. of ICME,
2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊