跳到主要內容

臺灣博碩士論文加值系統

(44.200.194.255) 您好!臺灣時間:2024/07/24 05:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:賴濬維
研究生(外文):Lai, Chun-Wei
論文名稱:使用BiLSTM-VAE進行碎片化3D人體運動序列的填充和重建
論文名稱(外文):Infilling and Reconstruction of Fragmented 3D Human Motion Sequence using BiLSTM-VAE
指導教授:蘇文鈺蘇文鈺引用關係
指導教授(外文):Su, Wen-Yu
口試委員:梁勝富朱威達胡敏君
口試日期:2023-06-21
學位類別:碩士
校院名稱:國立成功大學
系所名稱:人工智慧科技碩士學位學程
學門:電算機學門
學類:軟體發展學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:31
中文關鍵詞:深度神經網路人體動作填充人體動作重建
外文關鍵詞:Deep Neural NetworkHuman Motion InfillingHuman Motion Reconstruction
相關次數:
  • 被引用被引用:0
  • 點閱點閱:77
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
傳統的3D角色動畫需要耗費大量時間和人力,包括繪製關鍵幀和填充幀。為了應對這個挑戰,我們提出使用深度神經網絡生成動作或關鍵幀之間的過渡片段,從而無縫地拼接多個片段,實現成本效益的新動畫製作。
在本論文中,我們介紹了一種骨骼動畫合成器,能夠將多個運動序列無縫結合成連貫且連續的序列。該合成器在GitHub上公開提供,具有使用者友好的界面。在 Lin的方法基礎上,我們提升了生成動作的真實感,並在MPJPE方面實現了更好的性能。此外,流行的3D人體姿勢估計技術,如Mediapipe和OpenPose,存在由於估計誤差而導致關節點丟失的問題。經實驗證明,我們提出的合成器能夠有效地從這些零散的運動中重建原始運動序列。
通過對包括Human 3.6M、Mixamo和ChoreoMaster在內的基準數據集進行廣泛實驗,我們的方法展示了在各種人體運動動畫任務中生成合理且連貫運動的有希望結果。
Traditional 3D character animation involves time-consuming and labor-intensive processes of drawing keyframes and filling frames. To address this challenge, we propose to use deep neural networks to generate transition clips between actions or keyframes, facilitating the cost-effective production of new animations by seamlessly stitching multiple clips.
In this thesis, we introduce a skeleton animation synthesizer that seamlessly combines multiple motion sequences into a cohesive and uninterrupted sequence. The synthesizer is publicly available on GitHub, featuring a user-friendly interface. Building upon Lin's method, we have enhanced the realism of the generated motions and achieved better performance in terms of MPJPE. Additionally, popular 3D human pose estimation techniques such as Mediapipe and OpenPose have issues like missing joint points due to estimation errors. It is confirmed that the proposed synthesizer effectively reconstructs the original motion sequences from these fragmented movements.
Through extensive experimentation on benchmark datasets including Human 3.6M, Mixamo, and ChoreoMaster, our method shows promising results in generating plausible and coherent motions for a variety of human motion animation tasks.
中文摘要 i
Abstract ii
Acknowledgements iv
Contents v
List of Tables vii
List of Figures viii
1 Introduction 1
2 Related Works 3
2.1 Human Motion Skeleton Datasets 3
2.2 Human Motion Synthesis 4
2.3 Human Motion Capture 6
3 Method 7
3.1 Problem Formulation 7
3.2 Data Preprocessing 7
3.3 Model Architecture 12
3.4 Loss Functions 14
3.4.1 Joint Angle Loss 14
3.4.2 Motion Coherence Loss 14
3.4.3 KL Divergence 14
3.5 Animated Datasets 16
4 Experiment Results 17
4.1 Datasets 17
4.1.1 Human3.6M 17
4.1.2 Animated Dataset 17
4.1.3 ChoreoMaster 18
4.2 Evaluation Metrics 18
4.3 Results 19
4.3.1 Infilling Results 19
4.3.2 Reconstruction Results 22
5 Conclusions and Future Works 24
5.1 Conclusion 24
5.2 Future Works 24
References 26
[1] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence,36(7):1325–1339, 2013.
[2] Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja,Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, et al. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
[3] Brian F Allen and Petros Faloutsos. Evolved controllers for simulated locomotion. In International Workshop on Motion in Games, pages 219–230. Springer, 2009.
[4] Lucas Kovar, Michael Gleicher, and Fr ́ed ́eric Pighin. Motion graphs. In ACM SIGGRAPH 2008 classes, pages 1–10. 2008.
[5] Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 papers, pages 1–10. 2008.
[6] Janzaib Masood, Abdul Samad, Zulkafil Abbas, and Latif Khan. Evolution of locomotion controllers for snake robots. In 2016 2nd International Conference on Robotics and Artificial Intelligence (ICRAI), pages 164–169, 2016.
[7] Chuan Guo, Xinxin Zuo, Sen Wang, Shihao Zou, Qingyao Sun, Annan Deng, Minglun Gong, and Li Cheng. Action2Motion: Conditioned Generation of 3D
Human Motions, page 2021–2029. Association for Computing Machinery, 2020.
[8] Dario Pavllo, David Grangier, and Michael Auli. Quaternet: A quaternion-based
recurrent model for human motion. arXiv preprint arXiv:1805.06485, 2018.
[9] Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh.
Text2action: Generative adversarial synthesis from language to action. In 2018
IEEE International Conference on Robotics and Automation (ICRA), pages 5915–
5920. IEEE, 2018.
[10] Chaitanya Ahuja and Louis-Philippe Morency. Language2pose: Natural language grounded pose forecasting. In 2019 International Conference on 3D Vision (3DV), pages 719–728. IEEE, 2019.
[11] Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu,
Ming-Hsuan Yang, and Jan Kautz. Dancing to music. Advances in Neural Infor-
mation Processing Systems, 32, 2019.
[12] Alejandro Hernandez, Jurgen Gall, and Francesc Moreno-Noguer. Human mo-
tion prediction via spatio-temporal inpainting. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages 7134–7143, 2019.
[13] F ́elix G Harvey and Christopher Pal. Recurrent transition networks for character locomotion. In SIGGRAPH Asia 2018 Technical Briefs, pages 1–4. 2018.
[14] Yinglin Duan, Tianyang Shi, Zhengxia Zou, Yenan Lin, Zhehui Qian, Bohan
Zhang, and Yi Yuan. Single-shot motion completion with transformer. arXiv
preprint arXiv:2103.00776, 2021.
[15] Xinchen Yan, Akash Rastogi, Ruben Villegas, Kalyan Sunkavalli, Eli Shechtman,
Sunil Hadap, Ersin Yumer, and Honglak Lee. Mt-vae: Learning motion transformations to generate multimodal human dynamics. In Proceedings of the European conference on computer vision (ECCV), pages 265–281, 2018.
[16] Julieta Martinez, Michael J Black, and Javier Romero. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2891–2900, 2017.
[17] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
[18] Wei Mao, Miaomiao Liu, Mathieu Salzmann, and Hongdong Li. Learning trajectory dependencies for human motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9489–9497, 2019.
[19] Chi Zhou, Zhangjiong Lai, Suzhen Wang, Lincheng Li, Xiaohan Sun, and Yu Ding. Learning a deep motion interpolation network for human skeleton animations. Computer Animation and Virtual Worlds, 32(3-4):e2003, 2021.
[20] Jiaman Li, Ruben Villegas, Duygu Ceylan, Jimei Yang, Zhengfei Kuang, Hao Li,
and Yajie Zhao. Task-generic hierarchical human motion prior using vaes. In 2021
International Conference on 3D Vision (3DV), pages 771–781. IEEE, 2021.
[21] Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan,
Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, et al. A unified 3d human
motion synthesis model via conditional variational auto-encoder. In Proceedings
of the IEEE/CVF International Conference on Computer Vision, pages 11645–
11655, 2021.
[22] Jehee Lee, Jinxiang Chai, Paul SA Reitsma, Jessica K Hodgins, and Nancy S
Pollard. Interactive control of avatars animated with human motion data. In
Proceedings of the 29th annual conference on Computer graphics and interactive
techniques, pages 491–500, 2002.
[23] Wei Mao, Miaomiao Liu, and Mathieu Salzmann. History repeats itself: Human
motion prediction via motion attention. In European Conference on Computer
Vision, pages 474–489. Springer, 2020.
[24] Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. Recurrent network models for human dynamics. In Proceedings of the IEEE international
conference on computer vision, pages 4346–4354, 2015.
[25] Partha Ghosh, Jie Song, Emre Aksan, and Otmar Hilliges. Learning human motion models for long-term predictions. In 2017 International Conference on 3D Vision (3DV), pages 458–466. IEEE, 2017.
[26] Li-Yu Lin and Wen-Yu Su. 3d human motion interpolation and denoising with bilstm vae and animated dataset. Master’s thesis, National Cheng Kung University,
2022.
[27] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person
2d pose estimation using part affinity fields. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages 7291–7299, 2017.
[28] Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and
Michael J. Black. Amass: Archive of motion capture as surface shapes, 2019.
[29] Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo
Kanade, Shohei Nobuhara, and Yaser Sheikh. Panoptic studio: A massively mul-
tiview system for social motion capture. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), December 2015.
[30] Chen Kang, Zhipeng Tan, Jin Lei, Song-Hai Zhang, Yuan-Chen Guo, Weidong
Zhang, and Shi-Min Hu. Choreomaster: Choreography-oriented music-driven
dance synthesis. ACM Transactions on Graphics (TOG), 40(4), 2021.
[31] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark,
Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from
natural language supervision, 2021.
[32] Guy Tevet, Brian Gordon, Amir Hertz, Amit H. Bermano, and Daniel Cohen-Or.
Motionclip: Exposing human motion generation to clip space, 2022.
[33] Inc NeuronMocap. Neuronmocap website, 2023. https://neuronmocap.com/.
[34] Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mo-
hammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. VNect. ACM Transactions on Graphics, 36(4):1–14, jul 2017.
[35] Inc TurboSquid. Turbosquid website, 2023. https://www.turbosquid.com/.
[36] Inc Sketchfab. Sketchfab website, 2023. https://sketchfab.com/].
[37] Yujun Cai, Liuhao Ge, Jun Liu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan, and
Nadia Magnenat Thalmann. Exploiting spatial-temporal relationships for 3d pose
estimation via graph convolutional networks. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages 2272–2281, 2019.
[38] Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang
Wang, and Alex C Kot. Feature boosting network for 3d pose estimation. IEEE
transactions on pattern analysis and machine intelligence, 42(2):494–501, 2019.
[39] Julieta Martinez, Rayat Hossain, Javier Romero, and James J Little. A simple
yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE
international conference on computer vision, pages 2640–2649, 2017.
[40] Yujun Cai, Liuhao Ge, Jianfei Cai, and Junsong Yuan. Weakly-supervised 3d
hand pose estimation from monocular rgb images. In Proceedings of the European
Conference on Computer Vision (ECCV), pages 666–682, 2018.
[41] Chun-Wei Lai. 3dmotiongenerator, June2023.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top