跳到主要內容

臺灣博碩士論文加值系統

(100.28.2.72) 您好!臺灣時間:2024/06/22 21:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃詠筑
研究生(外文):HUANG, YONG-JHU
論文名稱:基於人體骨架序列與圖形卷積神經網路之動作辨識
論文名稱(外文):Skeleton-Based Action Recognition With Graph Convolutional Networks
指導教授:賴文能賴文能引用關係
指導教授(外文):LIE, WEN-NUNG
口試委員:賴文能江瑞秋余松年黃敬群
口試委員(外文):LIE, WEN-NUNGCHIANG, JUI-CHIUYU,SUNG-NIENHUANG, CHING-CHUN
口試日期:2020-07-28
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:68
中文關鍵詞:動作辨識圖形卷積神經網路人體骨架深度學習網路
外文關鍵詞:Action RecognitionGraph Convolutional NetworksSkeleton-BasedDeep Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:639
  • 評分評分:
  • 下載下載:109
  • 收藏至我的研究室書目清單書目收藏:0
隨著科技日新月異,電腦的應用越來越多元,近年來人工智慧成為一項熱門的研究。人體動作辨識在計算機視覺人工智慧技術的研究中一直佔有舉足輕重且關鍵之核心位置,它的應用非常廣泛,例如:人機互動、視頻監控以及視頻分析。人體動作辨識的輸入數據類型大致分為兩類:影像視頻以及人體三維骨架序列。對於影像視頻而言,前景中的人體與複雜背景混合再加上時間上的變化,要進行動作辨識是相當具挑戰性的。相反地,人體三維骨架序列可顯示骨架的連續時間變化以進行精確的動作辨識,很多研究結果顯示,經由 RGB-D 攝影機或RGB攝影機已可以精確取得人體三維骨架。因此,本論文採用人體三維骨架序列作為輸入的形式以進行人體動作辨識。

本研究提出一個基於人體三維骨架序列的動作辨識系統,著重在骨架序列的高階 (high order) 時空特徵前處理資訊抽取上,進而輸入圖形卷積神經網路 (GCN),並用端到端深度學習的方式進行訓練,這些骨架特徵資訊與GCN 網路結合的方式分兩種:早期融合及後期融合。早期融合係在尚未輸入深度學習模型前,將骨架序列時空特徵資訊以並聯方式進行連接,使得通道數擴大,再輸入到一個深度學習網路模型中進行處理,得到預測動作類別。後期融合則為不同性質的骨架序列時空特徵資訊,被輸入各自的深度學習網路模型,而每個模型都會有一個預測評分輸出,對所有模型的輸出進行加權融合,得到最終的預測動作類別。

我們運用NTU數據集來訓練圖形卷積神經網路。實驗結果顯示,我們所加入的時空高階資訊,不論在早期融合或後期融合都可以有效提升原來深度學習網路模型的辨識正確率,最高約增加0.73%,證明本論文提出的前處理方式可對人體三維骨架序列提供更多有效的辨識特徵,且根據實驗結果得知後期融合相對於早期融合可以得到更好的效果。

摘要 I
目錄 III
圖目錄 V
表目錄 VII
第一章 緒論 1
1.1 研究背景與動機 1
1.2 相關研究 2
1.2.1以網路架構分類 2
1.2.2以特定功能分類 8
1.3 本論文架構 11
第二章 人體骨架序列資訊前處理 14
2.1 空間上資訊提供 14
2.2 時間上資訊提供 18
第三章 基於圖形卷積神經網路之動作辨識 21
3.1 時空圖形卷積神經網路 (Spatial Temporal Graph Convolutional Networks)[12] 21
3.1.1 無向圖距離為1之歸一化鄰接矩陣 24
3.1.2 無向圖距離為2之歸一化鄰接矩陣 26
3.1.3 時空圖形卷積模型 28
3.2注意力自適應圖形卷積神經網路 (Attention Adaptive Graph Convolutional Networks)[14] 29
3.2.1 有向圖之歸一化鄰接矩陣 29
3.2.2 注意力自適應模型 34
3.3 損失函數 37
第四章 實驗結果與討論 38
4.1 實驗環境 38
4.2 數據集介紹 38
4.3 網路模型超參數設定 40
4.4 時空圖形卷積網路結合骨架序列前處理測試結果與比較 40
4.4.1 早期融合 40
4.4.2 後期融合 42
4.5注意力自適應圖形卷積網路結合骨架序列前處理之測試 44
4.5.1 早期融合 44
4.5.2 後期融合 45
4.6 消融性實驗 47
第五章 結論與未來工作 52
5.1 結論 52
5.2 未來工作 52
參考文獻 53


[1]F. Cruciani, A.Vafeiadis, C. Nugent, I. Cleland, P. McCullagh, K. Votis, D. Giakoumis, D. Tzovaras, L. Chen and R. Hamzaoui, “Feature learning for Human Activity Recognition using Convolutional Neural Networks,” CCF Transactions on Pervasive Computing and Interaction, Mar. 2020.
[2]F. Li, K. Shirahama, M.A. Nisar, L. Köpingand and M. Grzegorzek, “Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors,” American Chemical Society (ACS), Feb. 2018.
[3]L.B. Marinho, A.H. de Souza Junior and P. P. Rebou¸cas Filho, “A new approach to Human Activity Recognition using Machine Learning techniques,” Intelligent Systems Design and Applications (ISDA), Feb. 2017.
[4]B. Li, Y. Dai, X. Cheng, H. Chen, Y. Lin and M. He, “Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN,” Proc. of IEEE Conf. on International Conference on Multimedia & Expo Workshops (ICMEW), July. 2017.
[5]C. Li, Q. Zhong, D. Xie and S. Pu, “Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation,” Proc. of International Joint Conferences on Artificial Intelligence (IJCAI), Apr. 2018.
[6]H. Wang and L. Wang, “Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), Apr. 2017.
[7]R. Cui, A. Zhu, S. Zhang and G. Hu, “Multi-source Learning for Skeleton -based Action Recognition Using Deep LSTM Networks,” Proc. of IEEE Int'l Conf. on Pattern Recognition (ICPR), pp. 547-552, Aug. 2018.
[8]S. Song, C. Lan, J. Xing, W. Zeng and J. Liu, “Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition,” Proc. of IEEE Int'l Conf. on Multimedia and Expo (ICME), pp. 1-6, Jul. 2018.
[9]P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue and N. Zheng, “View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), Vol.41, No. 8, pp.1963-1978, Aug. 2019.
[10]S. Wei, Y. Song and Y. Zhang, “Human skeleton tree recurrent neural network with joint relative motion feature for skeleton based action recognition,” Proc. of IEEE Int'l Conf. on Pattern Recognition (ICPR), pp.91-95, Sept. 2017.
[11]S. Yan, Y. Xiong and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), arXiv:1801.07455, 2018.
[12]Y. Song, Z. Zhang and L. Wang, “Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons,” Proc. of IEEE Int'l Conf. on Image Processing (ICIP), pp.1-5, Sept. 2019.
[13]L. Shi, Y. Zhang, J. Cheng and H. Lu, “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), pp.12018-12027, Jun. 2019.
[14]L. Shi, Y. Zhang, J. Cheng and H. Lu, “Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), Dec. 2019.
[15]P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue and N. Zheng, “View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data,” IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), Vol.41, No. 8, pp. 1963-1978, Jan. 2019.
[16]R. Zhao, H. Ali and P. van der Smagt, “Two-Stream RNN/CNN for action recognition in 3D videos,” Proc. of IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems (IROS), pp.4260-4267, Sept. 2017.
[17]J. Liu, A. Shahroudy, D. Xu and G. Wang, “Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), Jul. 2016.
[18]J. Xu, K. Tasaka and H. Yanagihara, “Beyond two-stream: skeleton-based three-stream networks for action recognition in videos,” Proc. of IEEE Int'l Conf. on Pattern Recognition (ICPR), pp.1567-1573, Aug. 2018.
[19]S. Woo, J. Park, J.Y. Lee and I.S. Kweon, “CBAM: Convolutional Block Attention Module,” Proc. of The European Conference on Computer Vision (ECCV), Jul. 2018.
[20]C. Xie, C. Li, B. Zhang, C. Chen, J. Han, C. Zou and J. Liu, “Memory Attention Networks for Skeleton-based Action Recognition,” Proc. of The Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), Apr. 2018.
[21]S. Song, C. Lan, J. Xing, W. Zeng and J. Liu, “An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), Nov. 2016.
[22]J. Liu, G. Wang, P. Hu, L. Duan and A. C. Kot, “Global Context-Aware Attention LSTM Networks for 3D Action Recognition,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), pp.3671-3680, Jul. 2017.
[23]J. Liu, G. Wang, L. Duan, K. Abdiyeva and A. C. Kot, “Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks,” Proc. of IEEE Transactions on Image Processing (TIP), Vol.27, No.4, pp.1586-1599, Apr. 2018.
[24]R. Xiao, Y. Hou, Z. Guo, C. Li, P. Wang and W. Li, “Self-Attention Guided Deep Features for Action Recognition,” Proc. of IEEE Int'l Conf. on Multimedia and Expo (ICME), pp.1060-1065, Jul. 2019.
[25]A. Shahroudy, J. Liu, T. Ng and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), pp.1010-1019, Jun. 2016.
[26]B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, “Learning Deep Features for Discriminative Localization,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), pp.2921-2929, Jun. 2016.
[27]C. Lea, M. D. Flynn, R. Vidal, A. Reiter and G. D. Hager, “Temporal Convolutional Networks for Action Segmentation and Detection,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), pp.1003-1012, Jul. 2017.
[28]K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), pp.770-778, Jun. 2016.
[29]L. Shi, Y. Zhang and J. Cheng, H. Lu, “Non-Local Graph Convolutional Networks for Skeleton-Based Action Recognition,” arXiv:1805.07694, 2018.
[30]H. Wang, Y. Fan, Z. Wang, L. Jiao and B. Schiele, “Parameter-Free Spatial Attention Network for Person Re-Identification,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), Nov. 2018.
[31]Y. Zeng, X. Guo, H. Wang, M. Geng and T. Lu, “Efficient Dual Attention Module for Real-Time Visual Tracking,” Proc. of IEEE Int'l Conf. on Visual Communications and Image Processing (VCIP), pp.1-4, Dec. 2019.
[32]Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong and Yun Fu, “Image Super-Resolution Using Very Deep Residual Channel Attention Networks,” Proc. of The European Conference on Computer Vision (ECCV), 2018.
[33]L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu and T.S. Chua, “SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning,” Proc. of IEEE Int'l Conf. on Computer Vision and Pattern Recognition (CVPR), Nov. 2016.
[34]H. Ling, J. Wu, L. Wu, J. Huang, J. Chen and P. Li, “Self Residual Attention Network for Deep Face Recognition,” IEEE Access, April. 2019.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊