跳到主要內容

臺灣博碩士論文加值系統

(100.28.2.72) 您好!臺灣時間:2024/06/14 02:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林志軒
研究生(外文):Lin, Chih-Hsuan
論文名稱:用於學習式視訊壓縮內容自適應的動量比率適應
論文名稱(外文):Content- Adaptive Motion Rate Adaption for Learned Video Compression
指導教授:彭文孝
指導教授(外文):Peng, Wen­-Hsiao
口試委員:彭文孝蕭旭峰杭學鳴江瑞秋
口試委員(外文):Peng, Wen­-Hsiao
口試日期:2022-10-03
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:多媒體工程研究所
學門:電算機學門
學類:軟體發展學類
論文種類:學術論文
論文出版年:2022
畢業學年度:111
語文別:英文
論文頁數:36
中文關鍵詞:自適應學習式視訊壓縮視訊壓縮條件幀間編碼位元分配深度學習
外文關鍵詞:Content-adaptive Learned Video CompressionVideo CompressionConditional Inter-frame CodingBit allocationDeep Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:124
  • 評分評分:
  • 下載下載:8
  • 收藏至我的研究室書目清單書目收藏:0
這項研究為學習型視訊壓縮引入了一個線上動量比率自適應方案,目的是在獨立測試影像上實現內容適應性編碼,以緩解訓練和測試數據之間的領域差距。我們提出了一個以小區域像素對應組成的碼率分配圖,即所謂的α-map,以空間自適應的方式在動量模型和幀間編碼模型間分配碼率。在實際推論時,我們通過線上反向傳播法來最佳化α-map。此外,我們還加入了一個前瞻機制,以考慮其對未來幀的影響。廣泛的實驗結果證實,我們所提出的方案在結合了條件學習視訊編解碼器時,能夠有效地自適應動量碼率,特別是在具有復雜運動特徵的測試影片中,顯示出大大改善了速率-失真效能。並且我們也將此方法套用在其他模型中,證明此方法的有效性及使用彈性。
This work introduces an online motion rate adaptation scheme for learned video compression, with the aim of achieving content-adaptive coding on individual test sequences to mitigate the domain gap between training and test data. It features a patch-level bit allocation map, termed the α-map, to trade off between the bit rates for motion and inter-frame coding in a spatially-adaptive manner. We optimize the α-map through an online back-propagation scheme at inference time. Moreover, we incorporate a look-ahead mechanism to consider its impact on future frames. Extensive experimental results confirm that the proposed scheme, when integrated into a conditional learned video codec, is able to adapt motion bit rate effectively, showing much improved rate-distortion performance particularly on test sequences with complicated motion characteristics. Furthermore, the ablation experiments on another base model show the flexibility and efficiency of the proposed method.
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Learned Video Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Augmented Normalizing Flow­based Image Compression (ANFIC) . . . . . . 5
2.3 Conditional Augmented Normalizing Flow­based (CANF­based) Inter­frame
Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Content­Adaptive Compression . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Conditional Feature Transformation with the α­Map . . . . . . . . . . . . . . 9
3.3 Training Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Determining the α­Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Effectiveness of the α­Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Rate­Distortion Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Analysis of the Optimized α­Map . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Analysis of the Optimized Bit Allocation . . . . . . . . . . . . . . . . . . . . 20
iii
4.6 Analysis of the Special Sequences . . . . . . . . . . . . . . . . . . . . . . . . 20
4.7 Ablation Experiment on the Decoded α­Map . . . . . . . . . . . . . . . . . . 21
4.8 Ablation Experiment on Lightweight Base Model . . . . . . . . . . . . . . . . 21
4.9 Analysis of Model Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
[1] Y.­H. Ho, C.­C. Chan, W.­H. Peng, H.­M. Hang, and M. Domański, “Anfic: Image compression using augmented normalizing flows,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 613–626, 2021.
[2] Y.­H. Ho, C.­P. Chang, P.­Y. Chen, A. Gnutti, and W.­H. Peng, “Canf­vc: Conditional augmented normalizing flows for video compression,” in European Conference on Computer
Vision, 2022.
[3] X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image superresolution by deep spatial feature transform,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2018, pp. 606–615.
[4] “”5th challenge on learned image compression”,” URL http://compression.cc, 2022.
[5] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end­to­end deep
video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2019, pp. 11 006–11 015.
[6] G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end­to­end learning framework for video compression,” IEEE transactions on Pattern Analysis and Machine Intelligence, 2020.
[7] J. Lin, D. Liu, H. Li, and F. Wu, “M­lvc: multiple frames prediction for learned video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2020, pp. 3546–3554.
30
[8] O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elf­vc: Efficient learned flexible­rate video coding,” in Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV), October 2021, pp. 14 479–14 488.
[9] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scalespace flow for end­to­end optimized video compression,” in Proceedings of the IEEE/
CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512.
[10] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learning image and video compression
through spatial­temporal energy compaction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 071–10 080.
[11] O. Rippel, S. Nair, C. Lew, S. Branson, A. G. Anderson, and L. Bourdev, “Learned video
compression,” in Proceedings of the IEEE/CVF International Conference on Computer
Vision, 2019, pp. 3454–3463.
[12] R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression with
recurrent auto­encoder and recurrent probability model,” IEEE Journal of Selected Topics
in Signal Processing, vol. 15, no. 2, pp. 388–401, 2020.
[13] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression with
hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6628–6637.
[14] A. Golinski, R. Pourreza, Y. Yang, G. Sautiere, and T. S. Cohen, “Feedback recurrent
autoencoder for video compression,” in Proceedings of the Asian Conference on Computer
Vision, 2020.
[15] Z. Hu, Z. Chen, D. Xu, G. Lu, W. Ouyang, and S. Gu, “Improving deep video compression by resolution­adaptive flow coding,” in European Conference on Computer Vision.
Springer, 2020, pp. 193–209.
[16] H. Liu, M. Lu, Z. Ma, F. Wang, Z. Xie, X. Cao, and Y. Wang, “Neural video coding using
multiscale motion compensation and spatiotemporal context model,” IEEE Transactions
on Circuits and Systems for Video Technology, 2020.
31
[17] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2021, pp. 1502–1511.
[18] T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Optical flow and
mode selection for learning­based video coding,” in 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2020, pp. 1–6.
[19] T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Conditional coding for flexible learned video compression,” in Neural Compression: From Information
Theory to Applications–Workshop@ ICLR 2021, 2021.
[20] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, 2021.
[21] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and
error propagation aware deep video compression,” in European Conference on Computer
Vision. Springer, 2020, pp. 456–472.
[22] J. Campos, S. Meierhans, A. Djelouah, and C. Schroers, “Content adaptive optimization
for neural image compression,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR) Workshops, 2019.
[23] N. Zou, H. Zhang, F. Cricri, H. R. Tavakoli, J. Lainema, M. Hannuksela, E. Aksu, and
E. Rahtu, “L 2 c–learning to learn to compress,” in 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2020, pp. 1–6.
[24] G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and
error propagation aware deep video compression,” in European Conference on Computer
Vision. Springer, 2020, pp. 456–472.
[25] T. van Rozendaal, J. Brehmer, Y. Zhang, R. Pourreza, and T. S. Cohen, “Instance­adaptive
video compression: Improving neural codecs by training on the test set,” arXiv preprint
arXiv:2111.10302, 2021.
32
[26] H. Zhang, F. Cricri, H. R. Tavakoli, M. Santamaria, Y.­H. Lam, and M. M. Hannuksela,
“Learn to overfit better: finding the important parameters for learned image compression,” in 2021 International Conference on Visual Communications and Image Processing
(VCIP). IEEE, 2021, pp. 1–5.
[27] F. Brand, K. Fischer, and A. Kaup, “Rate­distortion optimized learning­based image
compression using an adaptive hierachical autoencoder with conditional hyperprior,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR) Workshops, 2021, pp. 1885–1889.
[28] M. Song, J. Choi, and B. Han, “Variable­rate deep image compression through spatiallyadaptive feature transform,” in Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2021, pp. 2380–2389.
[29] Z. Hu, G. Lu, J. Guo, S. Liu, W. Jiang, and D. Xu, “Coarse­to­fine deep video coding
with hyperprior­guided mode prediction,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2022, pp. 5921–5930.
[30] R. Feng, Z. Guo, Z. Zhang, and Z. Chen, “Versatile learned video compression,” arXiv
preprint arXiv:2111.03386, 2021.
[31] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned video
compression,” arXiv preprint arXiv:2111.13850, 2021.
[32] J. Liu, S. Wang, W.­C. Ma, M. Shah, R. Hu, P. Dhawan, and R. Urtasun, “Conditional
entropy coding for efficient video compression,” in European Conference on Computer
Vision. Springer, 2020, pp. 453–468.
[33] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned video
compression,” arXiv preprint arXiv:2111.13850, 2021.
[34] J. Ballé, V. Laparra, and E. Simoncelli, “End­to­end optimized image compression,” International Conference for Learning Representations, 2017.
33
[35] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations,
2018.
[36] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for
learned image compression,” Advances in Neural Information Processing Systems, vol. 31,
pp. 10 771–10 780, 2018.
[37] H. Ma, D. Liu, N. Yan, H. Li, and F. Wu, “End­to­end optimized versatile image compression with wavelet­like transform,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2020.
[38] H. Ma, D. Liu, R. Xiong, and F. Wu, “iwave: Cnn­based wavelet­like transform for image
compression,” IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1667–1679, 2019.
[39] Y. Choi, M. El­Khamy, and J. Lee, “Variable rate deep image compression with a conditional autoencoder,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3146–3154.
[40] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized
gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2020, pp. 7939–7948.
[41] F. Mentzer, G. D. Toderici, M. Tschannen, and E. Agustsson, “High­fidelity generative
image compression,” Advances in Neural Information Processing Systems, vol. 33, pp.
11 913–11 924, 2020.
[42] F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool, “Conditional probability models for deep image compression,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2018, pp. 4394–4402.
[43] T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, “End­to­end learnt image compression via non­local attention optimization and improved context modeling,” IEEE Transactions on Image Processing, vol. 30, pp. 3179–3191, 2021.
34
[44] C. Ma, Z. Wang, R. Liao, and Y. Ye, “A cross channel context model for latents in deep
image compression,” arXiv preprint arXiv:2103.02884, 2021.
[45] J. Ballé, N. Johnston, and D. Minnen, “Integer networks for data compression with latentvariable models,” in International Conference on Learning Representations, 2018.
[46] M. Li, W. Zuo, S. Gu, J. You, and D. Zhang, “Learning content­weighted deep image
compression,” IEEE transactions on pattern analysis and machine intelligence, vol. 43,
no. 10, pp. 3446–3461, 2020.
[47] M. Li, K. Ma, J. You, D. Zhang, and W. Zuo, “Efficient and effective context­based convolutional entropy modeling for image compression,” IEEE Transactions on Image Processing, vol. 29, pp. 5900–5911, 2020.
[48] R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression with
recurrent auto­encoder and recurrent probability model,” IEEE Journal of Selected Topics
in Signal Processing, vol. 15, no. 2, pp. 388–401, 2020.
[49] R. Feng, Z. Guo, Z. Zhang, and Z. Chen, “Versatile learned video compression,” arXiv
preprint arXiv:2111.03386, 2021.
[50] D. Sun, X. Yang, M.­Y. Liu, and J. Kautz, “Pwc­net: Cnns for optical flow using pyramid,
warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2018, pp. 8934–8943.
[51] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with taskoriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125,
2019.
[52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International
Conference for Learning Representations, 2015.
[53] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video
codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems
Conference, 2020, pp. 297–302.
35
[54] G. J. Sullivan, J.­R. Ohm, W.­J. Han, and T. Wiegand, “Overview of the high efficiency
video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top