跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.176) 您好!臺灣時間:2025/09/09 07:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:卓諭
研究生(外文):CHO, YU
論文名稱:基於深度學習之即時人類動作識別預測系統
論文名稱(外文):Deep Learning Based Real-Time Human Action Recognition and Motion Prediction System
指導教授:黃正民黃正民引用關係
指導教授(外文):HUANG, CHENG-MING
口試委員:陸敬互黃世勳李俊賢
口試委員(外文):LU, CHING-HUHUANG, SHIH-SHINHLEE, JIN-SHYAN
口試日期:2019-07-30
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:電機工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:93
中文關鍵詞:深度學習姿態估測動作識別運動預測
外文關鍵詞:Deep LearningHuman Pose EstimationHuman Action RecognitionHuman Motion Prediction
相關次數:
  • 被引用被引用:0
  • 點閱點閱:490
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
在機器智能領域中,讓機器能理解人類行為是一項必要的任務及挑戰。為了提升人機互動的效率、場景理解的準確度或突發狀況的預警等等,人類的動作識別與運動預測成為了重要關鍵。
在人類動作識別任務中,本文視其為分類任務,利用序列的人體姿態進行動作標籤的分類。本文利用了簡單的動作識別神經網路架構達到即時應用之目的,並提出透過運動預測的結果來進一步地提升動作識別的效能。在人類運動預測的處理中,本文則視其為回歸任務,利用過去一段時間的姿態序列來預測未來的姿態序列結果。我們也提出透過加入考量姿態運動量以及關節點可信度兩項特徵,來解決運動預測常有的平均姿態問題,並且提升對於關節點被遮蔽時的判別能力。
在實驗結果中,分別針對系統的即時性、加入運動預測的動作識別效能以及運動預測結果進行分析實驗。由實驗結果可以驗證得知,透過運動預測資訊可以很好的提升動作識別之效能,而加入姿態運動量以及關節點可信度也可以使運動預測呈現更好的結果。
In the field of machine intelligence, it is a necessary task also a challenge for machines to understand human behavior. In order to improve the efficiency of human-machine interaction, the accuracy of scene understanding or the early warning of unexpected situations, human action recognition and motion prediction become important keys.
First, in the human action recognition task, this thesis regards it as a classification problem, and uses the sequence of the human body pose to classify the action labels. A simple action recognition neural network architecture is employed to achieve the purpose of real-time application. The performance of action recognition is further improved by combining the results of motion prediction. In the processing of human motion prediction, we regard it as a regression task and utilizes the pose sequence from the past time period to predict the future pose sequence results. By considering the momentum of skeleton and the estimating confidence of each joint, the mean pose problem in motion prediction can be solved in this thesis, and the discriminating ability when joints obscured is also increased.
In the experimental results, the analysis experiments have been carried out the real-time system, the action recognition performance with motion prediction, and the motion prediction evaluation. It can be verified from the experimental results that the motion prediction information can improve the performance of motion recognition, and adding the features of skeleton momentum and joint confidence can also make the motion prediction show better results.
摘要 i
ABSTRACT ii
致謝 iv
表目錄 viii
圖目錄 ix
第一章 緒論 1
1.1 前言 1
1.2 研究動機 3
1.3 研究成果與貢獻 6
1.4 系統流程 7
1.5 論文架構 8
第二章 相關研究與文獻探討 9
2.1 神經網路 9
2.1.1 神經網路說明 9
2.1.2 卷積神經網路 10
2.1.3 遞迴神經網路 12
2.2 人體姿態估測 16
2.3 人類運動預測 18
2.4 人類動作識別 20
第三章 即時人類動作預測系統 24
3.1 人體姿態估測 24
3.1.1 子系統架構 25
3.1.2 Symmetric STN 26
3.1.3 Pose NMS 27
3.2 基於人類姿態運動預測 28
3.2.1 子系統架構 28
3.2.2 序列到序列模型 30
3.2.3 GRU模型拓樸 31
3.2.4 Dual Loss 32
3.3 基於人類姿態動作識別 35
3.3.1 子系統架構 35
3.3.2 骨架序列圖建構 36
3.3.3 Spatial Graph Convolution Network 37
3.3.4 Temporal Graph Convolution Network 38
第四章 系統架構與實現 40
4.1 即時系統架構實現 40
4.2 數據前處理 42
4.2.1 姿態生成誤差解決機制 43
4.2.2 姿態序列整理 44
4.3 神經網路細節 45
4.3.1 數據批量選取 45
4.3.2 運動預測系統 46
4.3.3 動作識別系統 47
第五章 實驗結果 49
5.1 動作識別數據集 50
5.1.1 NTU RGB+D 動作識別數據集 50
5.2 基於人類姿態運動預測實驗分析 52
5.2.1 選擇部分動作標籤 53
5.2.2 效能評估標準 54
5.2.3 遞迴神經網路單元訓練效能評估 56
5.2.4 優化器訓練比較 58
5.2.5 損失函數比較 60
5.2.6 訓練數據的特徵多樣性效能評估 68
5.2.7 Dual Loss之加權平均比較 70
5.2.8 關節點準確性評估 71
5.3 基於人類姿態動作識別實驗分析 79
5.3.1 數據集效能比較 79
5.3.2 運動預測回授效能評估 83
5.4 即時系統實驗分析 84
5.4.1 子系統速度評估 84
5.4.2 即時系統實驗 85
第六章 結論與未來展望 88
6.1 結論 88
6.2 未來展望 89
參考文獻 90
[1]“A 2019 guide to Human Pose Estimation with Deep Learning,” The Nanonets Blog, 12-Apr-2019. [Online]. Available: https://nanonets.com/blog/human-pose-estimation-2d-guide/. [Accessed: 26-Jul-2019].
[2]H. Xu, A. Das, and K. Saenko, “R-C3D: Region Convolutional 3D Network for Temporal Activity Detection,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5794–5803.
[3]K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 568–576.
[4]Joe Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4694–4702.
[5]A. Diba, V. Sharma, and L. V. Gool, “Deep Temporal Linear Encoding Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1541–1550.
[6]L. Wang et al., “Temporal Segment Networks: Towards Good Practices for Deep Action Recognition,” in Computer Vision – ECCV 2016, vol. 9912, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 20–36.
[7]H. Wang and L. Wang, “Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4382–4394, Sep. 2018.
[8]C. Li, Q. Zhong, D. Xie, and S. Pu, “Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 786–792.
[9]S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” arXiv:1801.07455 [cs], Jan. 2018.
[10]J. Martinez, M. J. Black, and J. Romero, “On Human Motion Prediction Using Recurrent Neural Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 4674–4683.
[11]Y. Tang, L. Ma, W. Liu, and W.-S. Zheng, “Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamics,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 935–941.
[12]K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent Network Models for Human Dynamics,” in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 4346–4354.
[13]A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep Learning on Spatio-Temporal Graphs,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 5308–5317.
[14]“NNDesign.pdf.” .
[15]“File:Artificial neural network.svg,” 維基百科,自由的百科全書. .
[16]Cecbur, English: 3 filters (=kernels, neurons) in the first layer of a convolutional artificial neural network interpreting an image. (A real network has many more.). 2019.
[17]M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
[18]“Understanding LSTM Networks -- colah’s blog.” [Online]. Available: http://colah.github.io/posts/2015-08-Understanding-LSTMs/. [Accessed: 26-Jul-2019].
[19]“2604.pdf.” .
[20]K. Cho et al., “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1724–1734.
[21]M. Mohammadi, R. Mundra, and R. Socher, “CS 224D: Deep Learning for NLP,” p. 12.
[22]J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv:1412.3555 [cs], Dec. 2014.
[23]B. Raj, “An Overview of Human Pose Estimation with Deep Learning,” Medium, 01-May-2019. [Online]. Available: https://medium.com/beyondminds/an-overview-of-human-pose-estimation-with-deep-learning-d49eb656739b. [Accessed: 26-Jul-2019].
[24]S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4724–4732.
[25]A. Newell, K. Yang, and J. Deng, “Stacked Hourglass Networks for Human Pose Estimation,” arXiv:1603.06937 [cs], Mar. 2016.
[26]G. Papandreou et al., “Towards Accurate Multi-person Pose Estimation in the Wild,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3711–3719.
[27]S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[28]K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.
[29]H. Fang, S. Xie, Y. Tai, and C. Lu, “RMPE: Regional Multi-person Pose Estimation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2353–2362.
[30]Z. Cao, T. Simon, S. Wei, and Y. Sheikh, “Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1302–1310.
[31]C. Li, Z. Zhang, W. S. Lee, and G. H. Lee, “Convolutional Sequence to Sequence Model for Human Dynamics,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 5226–5234.
[32]S. Toyer, A. Cherian, T. Han, and S. Gould, “Human Pose Forecasting via Deep Markov Models,” in 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017, pp. 1–8.
[33]J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv:1804.02767 [cs], Apr. 2018.
[34]W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang, “Learning Feature Pyramids for Human Pose Estimation,” arXiv:1708.01101 [cs], Aug. 2017.
[35]M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial Transformer Networks,” arXiv:1506.02025 [cs], Jun. 2015.
[36]“The Unreasonable Effectiveness of Recurrent Neural Networks.” [Online]. Available: http://karpathy.github.io/2015/05/21/rnn-effectiveness/. [Accessed: 26-Jul-2019].
[37]B. Fortuner, “Intro to Threads and Processes in Python,” Medium, 07-Sep-2017. [Online]. Available: https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b. [Accessed: 26-Jul-2019].
[38]“Introduction.” [Online]. Available: http://docs.nvidia.com/deploy/mps/index.html. [Accessed: 26-Jul-2019].
[39]“COCO - Common Objects in Context.” [Online]. Available: http://cocodataset.org/#keypoints-2018. [Accessed: 26-Jul-2019].
[40]OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation: CMU-Perceptual-Computing-Lab/openpose. CMU-Perceptual-Computing-Lab, 2019.
[41]A. Graves, “Generating Sequences With Recurrent Neural Networks,” arXiv:1308.0850 [cs], Aug. 2013.
[42]A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1010–1019.
[43]“Rapid-Rich Object Search (ROSE) Lab.” [Online]. Available: http://rose1.ntu.edu.sg/datasets/actionrecognition.asp. [Accessed: 26-Jul-2019].
[44]“MPII Human Pose Database.” [Online]. Available: http://human-pose.mpi-inf.mpg.de/. [Accessed: 26-Jul-2019].
[45]“FLIC dataset.” [Online]. Available: https://bensapp.github.io/flic-dataset.html. [Accessed: 26-Jul-2019].
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top