跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.83) 您好!臺灣時間:2025/01/25 18:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳沐融
研究生(外文):Chen, Mu-Jung
論文名稱:幀型自適應條件B-frame編碼
論文名稱(外文):Frame-type Adaptive Conditional B-frame Coding
指導教授:彭文孝
指導教授(外文):Peng,Wen-Hsiao
口試委員:江瑞秋彭文孝蕭旭峰杭學鳴
口試委員(外文):Chiang, Jui-ChiuPeng, Wen-HsiaoHsiao, Hsu-FengHang, Hsueh-Ming
口試日期:2023-10-18
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:112
語文別:中文
論文頁數:45
中文關鍵詞:影像壓縮深度學習雙向編碼幀型自適應編碼多尺度特徵混合
外文關鍵詞:Video CompressionDeep LearningBidirectional CodingFrame-type Adaptive CodingMulti-scale Feature Fusion
相關次數:
  • 被引用被引用:0
  • 點閱點閱:9
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在過去的幾年裡,基於深度神經網路的視訊壓縮已成為熱門的研究領域。然而,大多數研究專注於P-frame 壓縮。相較之下,針對B-frame 壓縮的研究則未有顯著的進展,也因此更具有挑戰性。本研究提出一個新穎的B-frame 壓縮框架,該框架利用運動預測和多尺度特徵融合來實現在B-frame 上的conditional coding (條件編碼)。我們的研究提出兩個新架構: frame-type adaptive coding (帧型自適應編碼) 以及B*-frame。帧型自適應編碼通過根據B-frame 的編碼層級動態調整模型當中的特徵分佈,從而讓不同層的B-frmae 壓縮能有更好的位元率分配。B*-frame 通過重複使用B-frame 編解碼器來模擬P-frame 壓縮,來實現更靈活的圖片組壓縮控制,且無需額外的獨立P-frame 編解碼器。此外,我們進行了三個深入的研究,包括條件信號生成的消融實驗,研究基於通道的自回歸熵模型用於幀間編碼的優勢,以及對訓練與測試資料在多階層B-frame 編碼下的嚴重domain shift (域偏移) 的探討。
Over the past few years, learning-based video compression has become an active research area. However, most works focus on P-frame coding. Learned B-frame coding is under-explored and more challenging. This work introduces a novel B-frame coding framework that exploits motion prediction and multi-scale feature fusion for conditional B-frame coding. Our work features two novel elements: frame-type adaptive coding and B*-frames. Our frame-type adaptive coding learns better bit allocation for hierarchical Bframe coding by dynamically adapting the feature distributions according to the B-frame type. Our B*-frames allow greater flexibility in specifying the group-of-pictures (GOP) structure by reusing the B-frame codec to mimic P-frame coding, without the need for an additional, separate P-frame codec. Additionally, we conduct three insightful studies, including ablations of conditioning signal generation, an exploration of the channel-wise autoregressive entropy model for inter-frame coding, and a comprehensive examination of the significant domain shift between the training and test scenarios in hierarchical B-frame coding.
摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..3
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Learned P-frame Coding . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Learned B-frame Coding . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Feature Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Autoregressive Entropy Model . . . . . . . . . . . . . . . . . . 7
3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Conditional Inter-frame and Motion Codecs . . . . . . . 11
3.4 Frame-type Adaptive Coding . . . . . . . . . . . . . . . . . . . . 13
3.5 Conditioning Signal Generation . . . . . . . . . . . . . . . . . 14
3.6 B*-frame Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Rate-Distortion Performance . . . . . . . . . . . . . . . . . . . 19
4.3 Ablation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.1 Frame-type Adaptive Coding . . . . . . . . . . . . . . . . . . 20
4.3.2 B*-frame Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.3 Conditioning Signal Generation . . . . . . . . . . . . . . . . 27
4.3.4 Channel-wise Autoregressive Entropy Model . . . . . 30
4.4 Domain Shift in Training B-frame Codecs . . . . . . . . . .31
4.5 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
6.1 Low-delay B-frame coding . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Dynamic Frame-level Bit Allocation . . . . . . . . . . . . . . .38
6.3 Spatial-channel Context Model . . . . . . . . . . . . . . . . . . 39
6.4 Transformer-based Codec . . . . . . . . . . . . . . . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
[1] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 006–11 015.
[2] “Ffmpeg,” https://www.ffmpeg.org/, accessed: 2022-05-018.
[3] “Hm reference software for hevc,” https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tree/HM-16.23/, accessed: 2022-05-018.
[4] “Vtm,” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/, accessed: 2022-03-02.
[5] T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Optical flow and mode selection for learning-based video coding,” in 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2020, pp. 1–6.
[6] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, 2021.
[7] Y.-H. Ho, C.-P. Chang, P.-Y. Chen, A. Gnutti, and W.-H. Peng, “Canf-vc: Conditional augmented normalizing flows for video compression,” European Conference on Computer Vision, 2022.
[8] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in Neural Information Processing Systems, vol. 31, pp. 10 771–10 780, 2018.
[9] D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” in 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 3339–3343.
[10] D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14 766–14 775.
[11] D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5708–5717.
[12] G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end-to-end learning framework for video compression,” IEEE transactions on Pattern Analysis and Machine Intelligence, 2020.
[13] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scalespace flow for end-to-end optimized video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512.
[14] J. Lin, D. Liu, H. Li, and F. Wu, “M-lvc: multiple frames prediction for learned video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3546–3554.
[15] Z. Hu, Z. Chen, D. Xu, G. Lu, W. Ouyang, and S. Gu, “Improving deep video compression by resolution-adaptive flow coding,” in European Conference on Computer Vision. Springer, 2020, pp. 193–209.
[16] O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elfvc: Efficient learned flexible-rate video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 14 479–14 488.
[17] H. Liu, M. Lu, Z. Ma, F. Wang, Z. Xie, X. Cao, and Y. Wang, “Neural video coding using multiscale motion compensation and spatiotemporal context model,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
[18] T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Conditional coding for flexible learned video compression,” in Neural Compression: From Information Theory to Applications– Workshop@ ICLR 2021, 2021.
[19] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned video compression,” IEEE Transactions on Multimedia, pp. 1–12, 2022.
[20] J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” in Proceedings of the 30th ACM International Conference on Multimedia. ACM, oct 2022. [Online]. Available: https://doi.org/10.1145%2F3503161.3547845
[21] Z. Hu, G. Lu, J. Guo, S. Liu, W. Jiang, and D. Xu, “Coarse-to-fine deep video coding with hyperprior-guided mode prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp.5921–5930.
[22] C.-Y. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through image interpolation,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
[23] A. Djelouah, J. Campos, S. Schaub-Meyer, and C. Schroers, “Neural inter-frame compression for video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
[24] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6628–6637.
[25] R. Pourreza and T. Cohen, “Extending neural p-frame codecs for b-frame coding,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6660–6669, 2021.
[26] M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing, vol. 31, pp. 974–983, 2022.
[27] Z. Chen, T. He, X. Jin, and F. Wu, “Learning for video compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 566–576, 2020.
[28] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502–1511.
[29] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
[30] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
[31] X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” 06 2018, pp. 606–615.
[32] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. Huang, “Free-form image inpainting with gated convolution,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 4470–4479.
[33] M. Flierl and B. Girod, “Generalized b pictures and the draft h.264/avc videocompression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 587–597, 2003.
[34] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4161–4170.
[35] J. Li, B. Li, and Y. Lu, “Neural video compression with diverse contexts,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18-22, 2023, 2023.
[36] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018.
[37] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 966–11 976.
[38] Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Rife: Real-time intermediate flow estimation for video frame interpolation,” arXiv preprint arXiv:2011.06294, 2020.
[39] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[40] Y.-H. Ho, C.-C. Chan, W.-H. Peng, H.-M. Hang, and M. Domański, “Anfic: Image compression using augmented normalizing flows,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 613–626, 2021.
[41] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
[42] J. B. Diederik P. Kingma, “Adam: A method for stochastic optimization,” International Conference for Learning Representations, 2015.
[43] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
[44] H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 1509–1513.
[45] F. Bossen et al., “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, no. 7, 2013.
[46] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943.
[47] Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 402–419.
[48] W. Park and M. Kim, “Deep predictive video compression using mode-selective uniand bi-directional predictions based on multi-frame hypothesis,” IEEE Access, vol. 9, pp. 72–85, 2021.
[49] J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformercnn architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1–10.
[50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[51] Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform coding,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=IDwN6xjHnK8
[52] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[53] F. Mentzer, G. Toderici, D. Minnen, S.-J. Hwang, S. Caelles, M. Lucic, and E. Agustsson, “Vct: A video compression transformer,” 2022. [Online]. Available: https://arxiv.org/abs/2206.07307
[54] J. Xiang, K. Tian, and J. Zhang, “Mimt: Masked image modeling transformer for video compression,” in The Eleventh International Conference on Learning Representations, 2022.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊