(35.175.212.130) 您好!臺灣時間:2021/05/18 04:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:吳俊德
研究生(外文):Chun-TeWu
論文名稱:利用自動編碼器與光流對影片進行重新編排
論文名稱(外文):Video Reordering with Optical Flows and Autoencoder
指導教授:李同益李同益引用關係
指導教授(外文):Tong-Yee Lee
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:36
中文關鍵詞:影片重組自動編碼器光流路徑搜尋演算法
外文關鍵詞:video resequencingautoencoder architectureoptical flowspath finding algorithms
相關次數:
  • 被引用被引用:0
  • 點閱點閱:27
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
為了解決影片重新排序的問題,我們提出了一種創新的深度學習框架來生成具有平滑運動的影片。給定一個影片或是一堆無序的圖片集合,一開始先利用我們所提出的神經網路,從圖片或影片裡的每一幀畫面中提取出特徵向量。接著,我們使用特徵向量之間的距離建構出一個完全圖。最後,根據使用者的要求,我們會使用三種不同的路徑搜尋演算法遍歷整個圖來產生影片結果。這些演算法對應於我們框架的三種不同應用:原始視頻重建,中間影格插入和影片重新排序。為了確保生成的影片裡的動作能夠「盡可能的順暢和合理」,我們在路徑搜尋演算法中將光流作為約束條件,並使用我們提出的神經網絡來計算光流之間的差異。實驗結果顯示,我們所提出的網絡在特徵提取方面比先前的研究有更好的表現。影片結果也證明了我們的框架可以適用於多種不同風格的影片或無序圖像集合,包括卡通、動畫、或是真實世界的影片。而我們的影片結果中也不會產生那些在先前研究中所出現的不合理的動作。
To solve the general video resequencing problem, we propose a novel deep learning framework to generate the natural result videos with smooth motion. Given an unordered image collection or a video, we first extract the latent vectors from the images/video frames by a novel architecture we propose. Then, we build a complete graph with the distance between latent vectors. Three different path finding algorithms are used to traverse the graph for producing video sequence results, which correspond to three applications of our framework: original video reconstruction, in-between frames insertion, and video resequencing. To ensure the motion of the resulting videos is “as smooth and reasonable as possible”, we use optical flows as the constraints in the path finding algorithms, and the network architecture we proposed is used to compute the difference of the optical flows. The experimental evaluation demonstrates that our proposed network has better performance than the previous work on the feature extraction, and the appealing result videos also show that our framework can be applied on many styles of videos or unordered image collection, including cartoon and realistic videos without unappealing motion problems in previous study.
摘要 i
Abstract ii
誌謝 iii
Table of Contents iv
List of Tables v
List of Figures vi
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Feature Extraction and Dimension Reduction 3
2.2 Images sequence ordering 4
Chapter 3 Method 7
3.1 Perceptual distance 8
3.1.1 Network architecture 10
3.1.2 Training 12
3.2 Optical flow coherency 13
3.2.1 Optical flow computing 13
3.2.2 Difference of optical flow 15
3.3 Animation sequencing 16
3.3.1 Original video reconstructing 18
3.3.2 In-between frames insertion 19
3.3.3 Animation resequencing 20
Chapter 4 Result 27
4.1 2AFC dataset comparison 27
4.2 Encoder evaluation 29
4.3 Video Results 31
4.3.1 In-between frames insertion results 31
4.3.2 Video resequencing results 32
Chapter 5 Conclusion and Future Works 34
References 35
[1] O. Fried, S. Avidan, and D. Cohen-Or. “Patch2vec: Globally consistent image patch representation. In Computer Graphics Forum, volume 36, pages 183–194. Wiley Online Library, 2017,
Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13284
[2] J. Yu, D. Tao, J. Li, J. Chen. “Semantic preserving distance metric learning and applications. Inform. Sci. 281 (2014) 674–686,
Available: http://dx.doi.org/10.1016/j.ins.2014.01.025
[3] Y. Yang, Y. Zhuang, D. Tao, D. Xu, J. Yu, and J. Luo. “Recognizing cartoon image gestures for retrieval and interactive cartoon clip synthesis, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1745–1756, Dec. 2010.
[4] Alex Gammerman, Volodya Vovk, Vladimir Vapnik. “Learning by transduction. arXiv preprint, arXiv:1301.7375, 2013
[5] A. SCHo ̈DL, R. SZELISKI, D. H. SALESIN, I. ANDESSA. “Video textures. Proceedings of SIGGRAPH 2000(July), 489–498. ISBN 1-58113-208-5
[6] L. P. Kaelbling, M. L. Littman, A. W. Moore. “Reinforcement learning: A survey. J. Artif. Int. Res., vol. 4, no. 1, pp. 237–285, May 1996. [Online].
Available: http://dl.acm.org/citation.cfm?id=1622737.1622748
[7] Jun Yu, Dacheng Tao, Meng Wang. “Semi-automatic cartoon generation
by motion planning. Multimedia Systems, 17(5):409-419, 2011
[8] Charles C. Morace, Chi-Kuo Yeh, Shang-Wei Zhang, Tong-Yee Lee. “Learning a Perceptual Manifold with Deep Features for Animation Video Resequencing. Transactions on Visualization and Computer Graphics 2018, Sep. 2018
[9] J. Zhang, J. Yu, and D. Tao. “Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans. Image Process., vol. 27, no. 5, pp. 2420–2432, May 2018.
[10] M. Osadchy, Y. L. Cun, and M. L. Miller. “Synergistic face detection and pose estimation with energy-based models. J. Mach. Learn. Res., vol. 8, pp. 1197–1215, May 2007. [Online].
Available: http://dl.acm.org/citation.cfm?id=1248659.1248700
[11] D. Holden, J. Saito, T. Komura, and T. Joyce. “Learning motion manifolds with convolutional autoencoders. in SIGGRAPH Asia 2015 Technical Briefs, ser. SA ’15. New York, NY, USA: ACM, 2015, pp. 18:1–18:4. [Online].
Available: http://doi.acm.org/10.1145/2820903.2820918
[12] A. Scho ̈dl and I. A. Essa. “Machine learning for video-based rendering. in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 1002–1008. [Online].
Available: http://papers.nips.cc/paper/1874-machine-learningfor-video-based-rende ring.pdf
[13] A. Scho ̈dl and I. A. Essa. “Controlled animation of video sprites. in Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ser. SCA ’02. New York, NY, USA: ACM, 2002, pp. 121–127. [Online].
Available: http://doi.acm.org/10.1145/545261.545281
[14] Shang-Wei Zhang, Charles C.Morace, Thi Ngoc Hanh Le, Chih-Kuo Yeh, Shih-Syun Lin, Sheng-Yi Yao, Tong-Yee Lee. Animation Video Resequencing with a Convolutional AutoEncoder. SIGGRAPH Asia 2019, Poster, Brisbane, Australia, Nov. 2019
[15] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. “The unreasonable effectiveness of deep features as a perceptual metric. CoRR, vol. abs/1801.03924, 018. [Online].
Available: http://arxiv.org/abs/1801.03924
[16] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition. arXiv preprint, arXiv:1409.1556, 2014.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks, CoRR, vol. abs/1603.05027, 2016. [Online].
Available: https://arxiv.org/abs/1603.05027
[18] L. A. Gatys, A. S. Ecker, and M. Bethge. “Image style transfer
using convolutional neural networks. CVPR, 2016.
[19] G. Huang, Z. Liu, K.Q. Weinberger, L. van der Maaten. “Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. vol. 1, p. 3 (2017)
[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization, CoRR, vol. abs/1412.6980, 2014. [Online].
Available: http://arxiv.org/abs/1412.6980
[21] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz. “PWC-Net: CNNs
for optical flow using pyramid, warping, and cost volume. arXiv preprint, arXiv:1709.02371, 2017.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文