( 您好!臺灣時間:2021/05/18 04:38
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):Video Reordering with Optical Flows and Autoencoder
指導教授(外文):Tong-Yee Lee
外文關鍵詞:video resequencingautoencoder architectureoptical flowspath finding algorithms
  • 被引用被引用:0
  • 點閱點閱:27
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
To solve the general video resequencing problem, we propose a novel deep learning framework to generate the natural result videos with smooth motion. Given an unordered image collection or a video, we first extract the latent vectors from the images/video frames by a novel architecture we propose. Then, we build a complete graph with the distance between latent vectors. Three different path finding algorithms are used to traverse the graph for producing video sequence results, which correspond to three applications of our framework: original video reconstruction, in-between frames insertion, and video resequencing. To ensure the motion of the resulting videos is “as smooth and reasonable as possible”, we use optical flows as the constraints in the path finding algorithms, and the network architecture we proposed is used to compute the difference of the optical flows. The experimental evaluation demonstrates that our proposed network has better performance than the previous work on the feature extraction, and the appealing result videos also show that our framework can be applied on many styles of videos or unordered image collection, including cartoon and realistic videos without unappealing motion problems in previous study.
摘要 i
Abstract ii
誌謝 iii
Table of Contents iv
List of Tables v
List of Figures vi
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Feature Extraction and Dimension Reduction 3
2.2 Images sequence ordering 4
Chapter 3 Method 7
3.1 Perceptual distance 8
3.1.1 Network architecture 10
3.1.2 Training 12
3.2 Optical flow coherency 13
3.2.1 Optical flow computing 13
3.2.2 Difference of optical flow 15
3.3 Animation sequencing 16
3.3.1 Original video reconstructing 18
3.3.2 In-between frames insertion 19
3.3.3 Animation resequencing 20
Chapter 4 Result 27
4.1 2AFC dataset comparison 27
4.2 Encoder evaluation 29
4.3 Video Results 31
4.3.1 In-between frames insertion results 31
4.3.2 Video resequencing results 32
Chapter 5 Conclusion and Future Works 34
References 35
[1] O. Fried, S. Avidan, and D. Cohen-Or. “Patch2vec: Globally consistent image patch representation. In Computer Graphics Forum, volume 36, pages 183–194. Wiley Online Library, 2017,
Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13284
[2] J. Yu, D. Tao, J. Li, J. Chen. “Semantic preserving distance metric learning and applications. Inform. Sci. 281 (2014) 674–686,
Available: http://dx.doi.org/10.1016/j.ins.2014.01.025
[3] Y. Yang, Y. Zhuang, D. Tao, D. Xu, J. Yu, and J. Luo. “Recognizing cartoon image gestures for retrieval and interactive cartoon clip synthesis, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1745–1756, Dec. 2010.
[4] Alex Gammerman, Volodya Vovk, Vladimir Vapnik. “Learning by transduction. arXiv preprint, arXiv:1301.7375, 2013
[5] A. SCHo ̈DL, R. SZELISKI, D. H. SALESIN, I. ANDESSA. “Video textures. Proceedings of SIGGRAPH 2000(July), 489–498. ISBN 1-58113-208-5
[6] L. P. Kaelbling, M. L. Littman, A. W. Moore. “Reinforcement learning: A survey. J. Artif. Int. Res., vol. 4, no. 1, pp. 237–285, May 1996. [Online].
Available: http://dl.acm.org/citation.cfm?id=1622737.1622748
[7] Jun Yu, Dacheng Tao, Meng Wang. “Semi-automatic cartoon generation
by motion planning. Multimedia Systems, 17(5):409-419, 2011
[8] Charles C. Morace, Chi-Kuo Yeh, Shang-Wei Zhang, Tong-Yee Lee. “Learning a Perceptual Manifold with Deep Features for Animation Video Resequencing. Transactions on Visualization and Computer Graphics 2018, Sep. 2018
[9] J. Zhang, J. Yu, and D. Tao. “Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans. Image Process., vol. 27, no. 5, pp. 2420–2432, May 2018.
[10] M. Osadchy, Y. L. Cun, and M. L. Miller. “Synergistic face detection and pose estimation with energy-based models. J. Mach. Learn. Res., vol. 8, pp. 1197–1215, May 2007. [Online].
Available: http://dl.acm.org/citation.cfm?id=1248659.1248700
[11] D. Holden, J. Saito, T. Komura, and T. Joyce. “Learning motion manifolds with convolutional autoencoders. in SIGGRAPH Asia 2015 Technical Briefs, ser. SA ’15. New York, NY, USA: ACM, 2015, pp. 18:1–18:4. [Online].
Available: http://doi.acm.org/10.1145/2820903.2820918
[12] A. Scho ̈dl and I. A. Essa. “Machine learning for video-based rendering. in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 1002–1008. [Online].
Available: http://papers.nips.cc/paper/1874-machine-learningfor-video-based-rende ring.pdf
[13] A. Scho ̈dl and I. A. Essa. “Controlled animation of video sprites. in Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ser. SCA ’02. New York, NY, USA: ACM, 2002, pp. 121–127. [Online].
Available: http://doi.acm.org/10.1145/545261.545281
[14] Shang-Wei Zhang, Charles C.Morace, Thi Ngoc Hanh Le, Chih-Kuo Yeh, Shih-Syun Lin, Sheng-Yi Yao, Tong-Yee Lee. Animation Video Resequencing with a Convolutional AutoEncoder. SIGGRAPH Asia 2019, Poster, Brisbane, Australia, Nov. 2019
[15] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. “The unreasonable effectiveness of deep features as a perceptual metric. CoRR, vol. abs/1801.03924, 018. [Online].
Available: http://arxiv.org/abs/1801.03924
[16] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition. arXiv preprint, arXiv:1409.1556, 2014.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks, CoRR, vol. abs/1603.05027, 2016. [Online].
Available: https://arxiv.org/abs/1603.05027
[18] L. A. Gatys, A. S. Ecker, and M. Bethge. “Image style transfer
using convolutional neural networks. CVPR, 2016.
[19] G. Huang, Z. Liu, K.Q. Weinberger, L. van der Maaten. “Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. vol. 1, p. 3 (2017)
[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization, CoRR, vol. abs/1412.6980, 2014. [Online].
Available: http://arxiv.org/abs/1412.6980
[21] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz. “PWC-Net: CNNs
for optical flow using pyramid, warping, and cost volume. arXiv preprint, arXiv:1709.02371, 2017.
第一頁 上一頁 下一頁 最後一頁 top