|
[1] G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 006–11 015. [2] “Ffmpeg,” https://www.ffmpeg.org/, accessed: 2022-05-018. [3] “Hm reference software for hevc,” https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tree/HM-16.23/, accessed: 2022-05-018. [4] “Vtm,” https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/, accessed: 2022-03-02. [5] T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Optical flow and mode selection for learning-based video coding,” in 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2020, pp. 1–6. [6] J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, 2021. [7] Y.-H. Ho, C.-P. Chang, P.-Y. Chen, A. Gnutti, and W.-H. Peng, “Canf-vc: Conditional augmented normalizing flows for video compression,” European Conference on Computer Vision, 2022. [8] D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in Neural Information Processing Systems, vol. 31, pp. 10 771–10 780, 2018. [9] D. Minnen and S. Singh, “Channel-wise autoregressive entropy models for learned image compression,” in 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 3339–3343. [10] D. He, Y. Zheng, B. Sun, Y. Wang, and H. Qin, “Checkerboard context model for efficient learned image compression,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14 766–14 775. [11] D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5708–5717. [12] G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end-to-end learning framework for video compression,” IEEE transactions on Pattern Analysis and Machine Intelligence, 2020. [13] E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scalespace flow for end-to-end optimized video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512. [14] J. Lin, D. Liu, H. Li, and F. Wu, “M-lvc: multiple frames prediction for learned video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3546–3554. [15] Z. Hu, Z. Chen, D. Xu, G. Lu, W. Ouyang, and S. Gu, “Improving deep video compression by resolution-adaptive flow coding,” in European Conference on Computer Vision. Springer, 2020, pp. 193–209. [16] O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elfvc: Efficient learned flexible-rate video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 14 479–14 488. [17] H. Liu, M. Lu, Z. Ma, F. Wang, Z. Xie, X. Cao, and Y. Wang, “Neural video coding using multiscale motion compensation and spatiotemporal context model,” IEEE Transactions on Circuits and Systems for Video Technology, 2020. [18] T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Conditional coding for flexible learned video compression,” in Neural Compression: From Information Theory to Applications– Workshop@ ICLR 2021, 2021. [19] X. Sheng, J. Li, B. Li, L. Li, D. Liu, and Y. Lu, “Temporal context mining for learned video compression,” IEEE Transactions on Multimedia, pp. 1–12, 2022. [20] J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” in Proceedings of the 30th ACM International Conference on Multimedia. ACM, oct 2022. [Online]. Available: https://doi.org/10.1145%2F3503161.3547845 [21] Z. Hu, G. Lu, J. Guo, S. Liu, W. Jiang, and D. Xu, “Coarse-to-fine deep video coding with hyperprior-guided mode prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp.5921–5930. [22] C.-Y. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through image interpolation,” in Proceedings of the European Conference on Computer Vision (ECCV), September 2018. [23] A. Djelouah, J. Campos, S. Schaub-Meyer, and C. Schroers, “Neural inter-frame compression for video coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. [24] R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6628–6637. [25] R. Pourreza and T. Cohen, “Extending neural p-frame codecs for b-frame coding,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6660–6669, 2021. [26] M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing, vol. 31, pp. 974–983, 2022. [27] Z. Chen, T. He, X. Jin, and F. Wu, “Learning for video compression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp. 566–576, 2020. [28] Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in feature space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502–1511. [29] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012. [30] B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021. [31] X. Wang, K. Yu, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” 06 2018, pp. 606–615. [32] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. Huang, “Free-form image inpainting with gated convolution,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 4470–4479. [33] M. Flierl and B. Girod, “Generalized b pictures and the draft h.264/avc videocompression standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 587–597, 2003. [34] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4161–4170. [35] J. Li, B. Li, and Y. Lu, “Neural video compression with diverse contexts,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18-22, 2023, 2023. [36] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018. [37] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 966–11 976. [38] Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Rife: Real-time intermediate flow estimation for video frame interpolation,” arXiv preprint arXiv:2011.06294, 2020. [39] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. [40] Y.-H. Ho, C.-C. Chan, W.-H. Peng, H.-M. Hang, and M. Domański, “Anfic: Image compression using augmented normalizing flows,” IEEE Open Journal of Circuits and Systems, vol. 2, pp. 613–626, 2021. [41] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019. [42] J. B. Diederik P. Kingma, “Adam: A method for stochastic optimization,” International Conference for Learning Representations, 2015. [43] A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302. [44] H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” in 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016, pp. 1509–1513. [45] F. Bossen et al., “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, no. 7, 2013. [46] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943. [47] Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J.-M. Frahm, Eds. Cham: Springer International Publishing, 2020, pp. 402–419. [48] W. Park and M. Kim, “Deep predictive video compression using mode-selective uniand bi-directional predictions based on multi-frame hypothesis,” IEEE Access, vol. 9, pp. 72–85, 2021. [49] J. Liu, H. Sun, and J. Katto, “Learned image compression with mixed transformercnn architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1–10. [50] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf [51] Y. Zhu, Y. Yang, and T. Cohen, “Transformer-based transform coding,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=IDwN6xjHnK8 [52] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. [53] F. Mentzer, G. Toderici, D. Minnen, S.-J. Hwang, S. Caelles, M. Lucic, and E. Agustsson, “Vct: A video compression transformer,” 2022. [Online]. Available: https://arxiv.org/abs/2206.07307 [54] J. Xiang, K. Tian, and J. Zhang, “Mimt: Masked image modeling transformer for video compression,” in The Eleventh International Conference on Learning Representations, 2022.
|