臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.176) 您好！臺灣時間：2025/09/09 07:44

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

卓諭

研究生(外文):

CHO, YU

論文名稱:

基於深度學習之即時人類動作識別預測系統

論文名稱(外文):

Deep Learning Based Real-Time Human Action Recognition and Motion Prediction System

指導教授:

黃正民

指導教授(外文):

HUANG, CHENG-MING

口試委員:

陸敬互、黃世勳、李俊賢

口試委員(外文):

LU, CHING-HU、HUANG, SHIH-SHINH、LEE, JIN-SHYAN

口試日期:

2019-07-30

學位類別:

碩士

校院名稱:

國立臺北科技大學

系所名稱:

電機工程系

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2019

畢業學年度:

107

語文別:

中文

論文頁數:

中文關鍵詞:

深度學習、姿態估測、動作識別、運動預測

外文關鍵詞:

Deep Learning、Human Pose Estimation、Human Action Recognition、Human Motion Prediction

相關次數:

被引用:0
點閱:490
評分:
下載:2
書目收藏:0

在機器智能領域中，讓機器能理解人類行為是一項必要的任務及挑戰。為了提升人機互動的效率、場景理解的準確度或突發狀況的預警等等，人類的動作識別與運動預測成為了重要關鍵。
在人類動作識別任務中，本文視其為分類任務，利用序列的人體姿態進行動作標籤的分類。本文利用了簡單的動作識別神經網路架構達到即時應用之目的，並提出透過運動預測的結果來進一步地提升動作識別的效能。在人類運動預測的處理中，本文則視其為回歸任務，利用過去一段時間的姿態序列來預測未來的姿態序列結果。我們也提出透過加入考量姿態運動量以及關節點可信度兩項特徵，來解決運動預測常有的平均姿態問題，並且提升對於關節點被遮蔽時的判別能力。
在實驗結果中，分別針對系統的即時性、加入運動預測的動作識別效能以及運動預測結果進行分析實驗。由實驗結果可以驗證得知，透過運動預測資訊可以很好的提升動作識別之效能，而加入姿態運動量以及關節點可信度也可以使運動預測呈現更好的結果。

In the field of machine intelligence, it is a necessary task also a challenge for machines to understand human behavior. In order to improve the efficiency of human-machine interaction, the accuracy of scene understanding or the early warning of unexpected situations, human action recognition and motion prediction become important keys.
First, in the human action recognition task, this thesis regards it as a classification problem, and uses the sequence of the human body pose to classify the action labels. A simple action recognition neural network architecture is employed to achieve the purpose of real-time application. The performance of action recognition is further improved by combining the results of motion prediction. In the processing of human motion prediction, we regard it as a regression task and utilizes the pose sequence from the past time period to predict the future pose sequence results. By considering the momentum of skeleton and the estimating confidence of each joint, the mean pose problem in motion prediction can be solved in this thesis, and the discriminating ability when joints obscured is also increased.
In the experimental results, the analysis experiments have been carried out the real-time system, the action recognition performance with motion prediction, and the motion prediction evaluation. It can be verified from the experimental results that the motion prediction information can improve the performance of motion recognition, and adding the features of skeleton momentum and joint confidence can also make the motion prediction show better results.

摘要 i
ABSTRACT ii
致謝 iv
表目錄 viii
圖目錄 ix
第一章緒論 1
1.1 前言 1
1.2 研究動機 3
1.3 研究成果與貢獻 6
1.4 系統流程 7
1.5 論文架構 8
第二章相關研究與文獻探討 9
2.1 神經網路 9
2.1.1 神經網路說明 9
2.1.2 卷積神經網路 10
2.1.3 遞迴神經網路 12
2.2 人體姿態估測 16
2.3 人類運動預測 18
2.4 人類動作識別 20
第三章即時人類動作預測系統 24
3.1 人體姿態估測 24
3.1.1 子系統架構 25
3.1.2 Symmetric STN 26
3.1.3 Pose NMS 27
3.2 基於人類姿態運動預測 28
3.2.1 子系統架構 28
3.2.2 序列到序列模型 30
3.2.3 GRU模型拓樸 31
3.2.4 Dual Loss 32
3.3 基於人類姿態動作識別 35
3.3.1 子系統架構 35
3.3.2 骨架序列圖建構 36
3.3.3 Spatial Graph Convolution Network 37
3.3.4 Temporal Graph Convolution Network 38
第四章系統架構與實現 40
4.1 即時系統架構實現 40
4.2 數據前處理 42
4.2.1 姿態生成誤差解決機制 43
4.2.2 姿態序列整理 44
4.3 神經網路細節 45
4.3.1 數據批量選取 45
4.3.2 運動預測系統 46
4.3.3 動作識別系統 47
第五章實驗結果 49
5.1 動作識別數據集 50
5.1.1 NTU RGB+D 動作識別數據集 50
5.2 基於人類姿態運動預測實驗分析 52
5.2.1 選擇部分動作標籤 53
5.2.2 效能評估標準 54
5.2.3 遞迴神經網路單元訓練效能評估 56
5.2.4 優化器訓練比較 58
5.2.5 損失函數比較 60
5.2.6 訓練數據的特徵多樣性效能評估 68
5.2.7 Dual Loss之加權平均比較 70
5.2.8 關節點準確性評估 71
5.3 基於人類姿態動作識別實驗分析 79
5.3.1 數據集效能比較 79
5.3.2 運動預測回授效能評估 83
5.4 即時系統實驗分析 84
5.4.1 子系統速度評估 84
5.4.2 即時系統實驗 85
第六章結論與未來展望 88
6.1 結論 88
6.2 未來展望 89
參考文獻 90

[1]“A 2019 guide to Human Pose Estimation with Deep Learning,” The Nanonets Blog, 12-Apr-2019. [Online]. Available: https://nanonets.com/blog/human-pose-estimation-2d-guide/. [Accessed: 26-Jul-2019].
[2]H. Xu, A. Das, and K. Saenko, “R-C3D: Region Convolutional 3D Network for Temporal Activity Detection,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5794–5803.
[3]K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 568–576.
[4]Joe Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4694–4702.
[5]A. Diba, V. Sharma, and L. V. Gool, “Deep Temporal Linear Encoding Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1541–1550.
[6]L. Wang et al., “Temporal Segment Networks: Towards Good Practices for Deep Action Recognition,” in Computer Vision – ECCV 2016, vol. 9912, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 20–36.
[7]H. Wang and L. Wang, “Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection,” IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4382–4394, Sep. 2018.
[8]C. Li, Q. Zhong, D. Xie, and S. Pu, “Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 786–792.
[9]S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” arXiv:1801.07455 [cs], Jan. 2018.
[10]J. Martinez, M. J. Black, and J. Romero, “On Human Motion Prediction Using Recurrent Neural Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 4674–4683.
[11]Y. Tang, L. Ma, W. Liu, and W.-S. Zheng, “Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamics,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. 935–941.
[12]K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent Network Models for Human Dynamics,” in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 4346–4354.
[13]A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep Learning on Spatio-Temporal Graphs,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 5308–5317.
[14]“NNDesign.pdf.” .
[15]“File:Artificial neural network.svg,” 維基百科，自由的百科全書. .
[16]Cecbur, English: 3 filters (=kernels, neurons) in the first layer of a convolutional artificial neural network interpreting an image. (A real network has many more.). 2019.
[17]M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, Nov. 1997.
[18]“Understanding LSTM Networks -- colah’s blog.” [Online]. Available: http://colah.github.io/posts/2015-08-Understanding-LSTMs/. [Accessed: 26-Jul-2019].
[19]“2604.pdf.” .
[20]K. Cho et al., “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1724–1734.
[21]M. Mohammadi, R. Mundra, and R. Socher, “CS 224D: Deep Learning for NLP,” p. 12.
[22]J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv:1412.3555 [cs], Dec. 2014.
[23]B. Raj, “An Overview of Human Pose Estimation with Deep Learning,” Medium, 01-May-2019. [Online]. Available: https://medium.com/beyondminds/an-overview-of-human-pose-estimation-with-deep-learning-d49eb656739b. [Accessed: 26-Jul-2019].
[24]S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4724–4732.
[25]A. Newell, K. Yang, and J. Deng, “Stacked Hourglass Networks for Human Pose Estimation,” arXiv:1603.06937 [cs], Mar. 2016.
[26]G. Papandreou et al., “Towards Accurate Multi-person Pose Estimation in the Wild,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3711–3719.
[27]S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[28]K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.
[29]H. Fang, S. Xie, Y. Tai, and C. Lu, “RMPE: Regional Multi-person Pose Estimation,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2353–2362.
[30]Z. Cao, T. Simon, S. Wei, and Y. Sheikh, “Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1302–1310.
[31]C. Li, Z. Zhang, W. S. Lee, and G. H. Lee, “Convolutional Sequence to Sequence Model for Human Dynamics,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 5226–5234.
[32]S. Toyer, A. Cherian, T. Han, and S. Gould, “Human Pose Forecasting via Deep Markov Models,” in 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2017, pp. 1–8.
[33]J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv:1804.02767 [cs], Apr. 2018.
[34]W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang, “Learning Feature Pyramids for Human Pose Estimation,” arXiv:1708.01101 [cs], Aug. 2017.
[35]M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial Transformer Networks,” arXiv:1506.02025 [cs], Jun. 2015.
[36]“The Unreasonable Effectiveness of Recurrent Neural Networks.” [Online]. Available: http://karpathy.github.io/2015/05/21/rnn-effectiveness/. [Accessed: 26-Jul-2019].
[37]B. Fortuner, “Intro to Threads and Processes in Python,” Medium, 07-Sep-2017. [Online]. Available: https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b. [Accessed: 26-Jul-2019].
[38]“Introduction.” [Online]. Available: http://docs.nvidia.com/deploy/mps/index.html. [Accessed: 26-Jul-2019].
[39]“COCO - Common Objects in Context.” [Online]. Available: http://cocodataset.org/#keypoints-2018. [Accessed: 26-Jul-2019].
[40]OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation: CMU-Perceptual-Computing-Lab/openpose. CMU-Perceptual-Computing-Lab, 2019.
[41]A. Graves, “Generating Sequences With Recurrent Neural Networks,” arXiv:1308.0850 [cs], Aug. 2013.
[42]A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1010–1019.
[43]“Rapid-Rich Object Search (ROSE) Lab.” [Online]. Available: http://rose1.ntu.edu.sg/datasets/actionrecognition.asp. [Accessed: 26-Jul-2019].
[44]“MPII Human Pose Database.” [Online]. Available: http://human-pose.mpi-inf.mpg.de/. [Accessed: 26-Jul-2019].
[45]“FLIC dataset.” [Online]. Available: https://bensapp.github.io/flic-dataset.html. [Accessed: 26-Jul-2019].

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於深度學習之語義分割用於隨機物件夾取
2.	基於深度姿態估測網路之無紋理模型拼裝規劃
3.	基於堆疊沙漏型網路進行漸進式人體姿態估計
4.	FPP(FramePixelPool)神經網路之動作識別
5.	適用於室內移動式機器人之人體動作辨識系統
6.	基於機器學習之語義分割及幾何建模配對進行物件姿態估測及夾取
7.	利用深度學習進行運動賽事結果預測
8.	使用深度學習資料探勘與對稱SURF軌跡之動作分析技術

無相關期刊

1.	利用光達感測資料之多目標分割檢測與追蹤
2.	擴增實境之視覺定位暨其應用
3.	基於神經網路之多目標追蹤與軌跡預測
4.	利用預測模型改善軌跡搜尋之視覺語言導航系統
5.	基於語意分割與點雲匹配之三維環境重建
6.	利用語義分割強化粒子濾波器之目標物影像追蹤
7.	基於卡爾曼濾波器之四軸飛行器室內定位與飛行控制
8.	3D場景目標物檢測與對位暨虛實整合應用
9.	以聯合相關濾波器與軌跡預測之即時目標物追蹤
10.	多台攝影機之合作視覺同步定位與建圖暨其應用
11.	即時強健之目標物視覺追蹤與主體偵測
12.	基於深度學習網路之時間一致性影片風格轉換
13.	應用於虛擬體驗中環境即時顯示之有效地圖記憶體管理
14.	基於深度學習之三維點雲實例分割系統
15.	以全景語義分割與邊緣檢測引導之圖像轉換一致性

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室