跳到主要內容

臺灣博碩士論文加值系統

(44.213.60.33) 您好!臺灣時間:2024/07/22 14:59
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:曹駿杰
研究生(外文):Chou, Chon-Kit
論文名稱:擴散模型的線條引導單張圖片產生動畫
論文名稱(外文):Line-Guided Animation from Single Image Using Diffusion Model
指導教授:李同益李同益引用關係
指導教授(外文):Lee, Tong-Yee
口試委員:李同益林昭宏孫永年顏韶威姚智原
口試委員(外文):Lee, Tong-YeeLin, Chao-HungSun, Yung-NienYen, Shao-WeiYao, Chih-Yuan
口試日期:2023-07-27
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:53
中文關鍵詞:單張圖片擴散模型流體動畫線條控制
外文關鍵詞:Single ImageDiffusion ModelFluid AnimationLine Control
相關次數:
  • 被引用被引用:0
  • 點閱點閱:73
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在本次論文研究,我們提出了一個使用線條控制水流產生的擴散模型,取代了傳統的圖像變形(Warping)方法。一直以來從單張圖片生成動態影片都是比較麻煩和耗時的工作,而本論文的方法可以讓初學者就能輕鬆生成出心目中理想的動畫,他們只要畫出少數幾條希望水流移動的方向線條,模型就會自動生成一段水流動畫,再不需要學習相關層面的知識也能實現。
本篇論文的方法中,首先我們希望把前景與背景分開處理,方便使用者明確地選出需要移動的區域並在區域中畫出流動線條,而使用者需要進行操作的只有選取動畫區域和畫出流動方向兩個部份,接著線條會經由模型轉變為我們擴散模型中所需的光流數據。在擴散模型中我們使用光流圖代替了文字的部份,令模型可以通過光流圖更加準確地產出使用者的要求結果。最後,新的生成圖會當作下一個輸入放進模型中再次生成,並把連貫的圖片制作為影片。
本論文的影片制作方法僅僅需要一張圖片即可,不需要使用影片作為輸入,可以大幅減少其運算量。另外,相較於其他論文中擴散模型所提供的性質可以更輕鬆地解決一些制作動畫的問題,例如鬼影(Ghost effect)、背景變形和真實性等等,其他論文中可能需要多種方法分別解決各種問題,而擴散模型的方法則不用額外處理。論文的最後會與其他論文作比較,並提出本論文方法中與其他論文的更優勝之處。
In this research paper, we propose a diffusion model utilizing line control to generate water flow, replacing the conventional image warping method. Generating dynamic videos from single images has always been a cumbersome and time-consuming task. The approach presented in this paper allows beginners to effortlessly create the desired animations. They only need to draw a few directional lines indicating the desired water flow, and the model will automatically generate a water flow animation. This eliminates the need to learn related technical knowledge to achieve the desired outcome.
In the method described in this paper, our first step involves separating the foreground and background for more distinct processing. This facilitates users in clearly selecting the areas to be manipulated and drawing flow lines within those regions. Users are only required to perform two actions: selecting the animation area and sketching the flow directions. Subsequently, the lines are transformed by the model into the optical flow data required for our diffusion model. In the diffusion model, we replace the textual input with optical flow maps, enabling the model to produce the desired user results more accurately through these flow maps. Finally, the newly generated image is treated as the next input for the model, generating subsequent frames in a sequential manner, which are then compiled into a coherent video.
The video production method presented in this paper only requires a single image, eliminating the need for using videos as input, which significantly reduces computational complexity. Furthermore, compared to diffusion models proposed in other papers, the properties offered by our diffusion model can more easily address various animation-related challenges, such as the ghost effect, background deformation, and realism. While other papers might require multiple methods to tackle different issues, the diffusion model's approach integrates these solutions without additional processing. The conclusion of the paper will include a comparison with other papers and highlight the advantages of our method over others in terms of it.
摘要 I
ABSTRACT II
致謝 III
TABLE OF CONTENTS IV
LIST OF FIGURES VI
1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 SYSTEM FRAMEWORK 3
1.3 CONTRIBUTION 5
2 RELATED WORK 6
2.1 DIFFUSION MODELS 6
2.2 TEXT-TO-IMAGE GENERATION 7
2.3 VIDEO DIFFUSION MODELS 10
2.4 VIDEO EDITING 12
3 METHOD 16
3.1 PRELIMINARIES 17
3.1.1 Why needs diffusion? 17
3.1.2 Denoising diffusion probabilistic models (DDPMs ) 19
3.1.3 Latent diffusion models (LDMs ) 22
3.2 FLOW MAP GENERATOR 23
3.2.1 Flow Map 23
3.2.2 Line and Flow map Conversion 25
3.3 CONTROL LINE DIFFUSION MODEL 27
3.3.1 Autoencoder 28
3.3.2 Adjustment between Training Frames 29
3.3.3 Diffusion stage 32
3.4 REGION MATTING 33
3.4.1 Mask Matting 34
3.4.2 Foreground and Background Processing 35
4 RESULT AND DISCUSSION 37
4.1 TRAINING AND IMPLEMENTATION DETAILS 37
4.2 DATASETS 37
4.3 RESULTS 38
4.4 RESULT COMPARISON 41
4.5 FAILURE CASES 49
5 CONCLUSION AND FUTURE WORK 51
REFERENCES 52
[1] J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems, vol. 33, pp. 6840-6851, 2020.
[2] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684-10695.
[3] J. Song, C. Meng, and S. Ermon, "Denoising diffusion implicit models," arXiv preprint arXiv:2010.02502, 2020.
[4] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, "Hierarchical text-conditional image generation with clip latents," arXiv preprint arXiv:2204.06125, 2022.
[5] A. Radford et al., "Learning transferable visual models from natural language supervision," in International conference on machine learning, 2021: PMLR, pp. 8748-8763.
[6] C. Saharia et al., "Photorealistic text-to-image diffusion models with deep language understanding," Advances in Neural Information Processing Systems, vol. 35, pp. 36479-36494, 2022.
[7] J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, "Video diffusion models," arXiv preprint arXiv:2204.03458, 2022.
[8] U. Singer et al., "Make-a-video: Text-to-video generation without text-video data," arXiv preprint arXiv:2209.14792, 2022.
[9] J. Z. Wu et al., "Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation," arXiv preprint arXiv:2212.11565, 2022.
[10] A. Mahapatra and K. Kulkarni, "Controllable Animation of Fluid Elements in Still Images," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3667-3676.
[11] S. Fan, J. Piao, C. Qian, K.-Y. Lin, and H. Li, "Simulating Fluids in Real-World Still Images," arXiv preprint arXiv:2204.11335, 2022.
[12] T.-S. Chen, C. H. Lin, H.-Y. Tseng, T.-Y. Lin, and M.-H. Yang, "Motion-Conditioned Diffusion Model for Controllable Video Synthesis," arXiv preprint arXiv:2304.14404, 2023.
[13] R. Sugimoto, M. He, J. Liao, and P. V. Sander, "Water Simulation and Rendering from a Still Photograph," in SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1-9.
[14] A. Holynski, B. L. Curless, S. M. Seitz, and R. Szeliski, "Animating pictures with 62 eulerian motion fields," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5810-5819.
[15] T.-N.-H. Le, C.-K. Yeh, Y.-C. Lin, and T.-Y. Lee, "Animating Still Natural Images Using Warping," ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 1, pp. 1-24, 2023.
[16] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, "Semantic image synthesis with spatially-adaptive normalization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2337-2346.
[17] P. Esser, R. Rombach, and B. Ommer, "Taming transformers for high-resolution image synthesis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12873-12883.
[18] A. Van Den Oord and O. Vinyals, "Neural discrete representation learning," Advances in neural information processing systems, vol. 30, 2017.
[19] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical Image Computing and Computer Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015: Springer, pp. 234-241.
[20] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[21] A. Kirillov et al., "Segment anything," arXiv preprint arXiv:2304.02643, 2023
電子全文 電子全文(網際網路公開日期:20280821)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top