臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.227) 您好！臺灣時間：2026/05/15 06:55

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

林晏平

研究生(外文):

Yen-Ping Lin

論文名稱:

利用時空圖卷積網路的動素識別進行手部組裝動作預測研究

論文名稱(外文):

Hand Assemble Action Prediction using Therblig Recognition by Spatial Temporal Graph Convolutional Network

指導教授:

楊朝龍

指導教授(外文):

Chao-Lung Yang

口試委員:

花凱龍、王孔政

口試委員(外文):

Kai-Lung Hua、Kung-Jeng Wang

口試日期:

2021-07-29

學位類別:

碩士

校院名稱:

國立臺灣科技大學

系所名稱:

工業管理系

學門:

商業及管理學門

學類:

其他商業及管理學類

論文種類:

學術論文

論文出版年:

2021

畢業學年度:

109

語文別:

英文

論文頁數:

中文關鍵詞:

骨架動作辨識、時空圖卷積網路、組裝生產線、動素、貼標

外文關鍵詞:

skeleton-based hand gesture recognition、Spatial Temporal Graph Convolutional Networks、assembly line、Therblig、labeling

相關次數:

被引用:0
點閱:319
評分:
下載:0
書目收藏:0

本研究旨在開發一個基於手部骨架動作辨識技術於組裝生產作業現場的組裝動作分析架構。在Gilbreth提出的動作研究中，工作場域中的所有手部動作（Action）都由17種動素（Therblig）組成。本研究利用動素分析的概念，針對兩個主題進行研究：1) 分析使用相同的影像資料時，使用不同的標籤與關節數量是否會影響人體動作辨識模型的預測準確度；2) 提出一個以人因工程的動素辨識進而推測組裝動作的方法。本研究以骨架資訊擷取套件OpenPose所輸出的骨架資訊，利用時空圖卷積網路（Spatial Temporal - Graph Convolutional Network, ST-GCN）對組裝作業員的連續裝配動作或動素進行辨識。首先，先針對不同的標籤與關節數量進行人體動作辨識模型的預測準確度分析，然後再藉由動素辨識結果的組合，使用動態規劃方法（Dynamic Time Warping, DTW）來預測可能的組裝動作，以達到預測動作的目的。本實驗以主機板組裝中常見的動作作為模型訓練及測試。第一部分實驗結果發現，動作標籤搭配26個關節點可使辨識模型得到最佳的準確度，而動素標籤則依照動素定義僅需要慣用手23點的資訊即可得到最佳準確度。第二部分的實驗研究結果顯示以動素預測結果預測動作準確率為73.75%，而使用動素結合相應物件（Therblig-Item）的預測結果預測動作準確度可以提升至81.25%，顯示動素結合相對應物件可更有效預地測出動作。本研究結果可發現利用動素辨識可降低更換動作造成重新訓練模型的訓練成本。

This research aims to develop a hand gesture recognition framework for an assembly production site. In human motion study, Gilbreth considers that all hand actions in the workplace are composed of 17 therbligs. This research utilizes the concept of therblig analysis to investigate two topics: 1) analyze the usage of different labels and number of joints which might affect the prediction accuracy of human action recognition models when using the same image data. 2) propose a method for inferring assembly actions by using the result of therblig recognition. In this study, the skeleton information output from OpenPose was used to recognize the human action of the assembly operation using Spatial Temporal - Graph Convolutional Networks (ST-GCN). First, the prediction accuracy of the human action recognition model under different labels and the number of joints was analyzed. Then the possible assembly actions are classification by Dynamic Time Warping through the combination of the therblig recognition results. In the first part of the experiment, it was found that the action labeling with 26 joints resulted in the best accuracy of the recognition model, while the therblig labeling with only 23 joints of the handedness was required to obtain the best accuracy according to the therblig definition. The second part of the experiment showed that the accuracy of action classification with therblig prediction result was 73.75%. In addition, the accuracy of action classification with therblig combined with the corresponding object (Therblig-Item) could be increased to 81.25%. The result of the experiments can conclude that the proposed method can reduce the cost of retraining the model for action recognition by using the therblig recognition.

摘要 i
ABSTRACT ii
致謝 iii
TABLE OF CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES viii
CHAPTER 1. INTRODUCTION 1
1.1 The Status of Manufacturing Industry 1
1.2 Application Difficulties of Human Action Recognition in manufacturing industry 2
1.3 Thesis Structure 3
CHAPTER 2. LITERATURE REVIEW 4
2.1 Human Action Recognition 4
2.2 Skeleton-based Human Action Recognition 8
2.3 Training Problem of Deep Learning Model 9
CHAPTER 3. METHODOLOGY 11
3.1 Research Framework 11
3.2 OpenPose Skeleton 12
3.2.1 Human Body Skeleton Detection 14
3.2.2 Hand Skeleton Detection 14
3.3 Skeleton-based Hand Gesture Recognition 15
3.4 Filter 18
3.4.1 Accumulative Moving HAR Filter (AMHF) 18
3.4.2 Accumulative Moving DTW Filter (AMDF) 22
3.5 Action Prediction by using Therblig Recognition 24
CHAPTER 4. EXPERIMENTS AND RESULTS 26
4.1 Data and Label 26
4.1.1 Data Acquisition 26
4.1.2 Data Labeling 30
4.1.3 Data Balancing 32
4.2 Simulation of Labeling Issue 34
4.3 Implementation 37
4.3.1 ST-GCN Configuration 37
4.3.2 HAR Model Performance Evaluation 40
4.3.3 Action Classification Performance Evaluation 41
4.4 Experiments and Results 42
4.4.1 Experiments of HAR prediction 42
4.4.2 Experiments of action classification by DTW 45
4.5 Result Discussion 48
CHAPTER 5. CONCLUSION 50
5.1 Conclusion 50
5.2 Future work 51
REFERENCES 53
APPENDIX 58

[1] W. Dai, A. Mujeeb, M. Erdt et al., "Soldering defect detection in automatic optical inspection," Advanced Engineering Informatics, vol. 43, p. 101004, 2020.
[2] F. B. Gilbreth and R. T. Kent, Motion Study: A Method For Increasing The Efficiency Of The Workman. D. Van Nostrand Company, 1911.
[3] M. Zeng, L. T. Nguyen, B. Yu et al., "Convolutional neural networks for human activity recognition using mobile sensors," in 6th International Conference on Mobile Computing, Applications and Services, Austin, TX, USA, Nov 6 - 7 2014: IEEE, pp. 197-205.
[4] N. Ho, P.-M. Wong, M. Chua et al., "Virtual reality training for assembly of hybrid medical devices," Multimedia Tools and Applications, vol. 77, no. 23, pp. 30651-30682, 2018.
[5] X. Yin, X. Fan, W. Zhu et al., "Synchronous AR assembly assistance and monitoring system based on ego-centric vision," Assembly Automation, 2019.
[6] J. Carreira and A. Zisserman, "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset," in IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, Jul 21 - 26 2017, pp. 6299-6308.
[7] Y. Sun, E. Lank, and M. Terry, "Label-and-Learn: Visualizing the Likelihood of Machine Learning Classifier's Success During Data Labeling," in 22nd International Conference on Intelligent User Interfaces, New York, USA, Mar 13 - 16 2017, pp. 523-534.
[8] S. Vishwakarma and A. Agrawal, "A Survey on Activity Recognition and Behavior Understanding in Video Surveillance," The Visual Computer, vol. 29, no. 10, pp. 983-1009, 2013.
[9] R. Poppe, "A Survey on Vision-Based Human Action Recognition," Image and vision computing, vol. 28, no. 6, pp. 976-990, 2010.
[10] K. Soomro, A. R. Zamir, and M. Shah, "UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild," arXiv preprint arXiv:1212.0402, 2012.
[11] A. Shahroudy, J. Liu, T.-T. Ng et al., "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis," in IEEE conference on computer vision and pattern recognition, Las Vegas, USA, Jun 26 - Jul 01 2016, pp. 1010-1019.
[12] M. Fu, N. Chen, Z. Huang et al., "Human Action Recognition: A Survey," in International Conference On Signal And Information Processing, Networking And Computers, 2018: Springer, pp. 69-77.
[13] M. Moniruzzaman, Z. Yin, Z. H. He et al., "Human Action Recognition by Discriminative Feature Pooling and Video Segmentation Attention Model," IEEE Transactions on Multimedia, 2021.
[14] C. Li, Q. Huang, X. Li et al., "Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences," arXiv preprint arXiv:2101.07618, 2021.
[15] C. Liu, J. Ying, H. Yang et al., "Improved human action recognition approach based on two-stream convolutional neural network model," The Visual Computer, pp. 1-15, 2020.
[16] M. E. Kalfaoglu, S. Kalkan, and A. A. Alatan, "Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition," in European Conference on Computer Vision, Aug 23 - 28 2020: Springer, pp. 731-747.
[17] J. Zang, L. Wang, Z. Liu et al., "Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition," in IFIP International Conference on Artificial Intelligence Applications and Innovations, Rhodes, Greece, May 25 - 27 2018: Springer, pp. 97-108.
[18] L. Wang, Y. Xu, J. Cheng et al., "Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks," IEEE access, vol. 6, pp. 17913-17922, 2018.
[19] L. Wang, Y. Xiong, Z. Wang et al., "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition," in European conference on computer vision, Amsterdam, The Netherlands, Oct 8 - 16 2016: Springer, pp. 20-36.
[20] K. Simonyan and A. Zisserman, "Two-stream Convolutional Networks for Action Recognition in Videos," arXiv preprint arXiv:1406.2199, 2014.
[21] S. Ji, W. Xu, M. Yang et al., "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221-231, 2012.
[22] T. N. Kipf and M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," arXiv preprint arXiv:1609.02907, 2016.
[23] S. Yan, Y. Xiong, and D. Lin, "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition," in AAAI conference on artificial intelligence, New Orleans, Louisiana, USA, Feb 2 – 7 2018, vol. 32, no. 1.
[24] M.-F. Tsai and C.-H. Chen, "Spatial Temporal Variation Graph Convolutional Networks (STV-GCN) for Skeleton-Based Emotional Action Recognition," IEEE Access, vol. 9, pp. 13870-13877, 2021.
[25] C. Liu, X. Li, Q. Li et al., "Robot Recognizing Humans Intention and Interacting with Humans Based on a Multi-Task Model Combining ST-GCN-LSTM Model and YOLO Model," Neurocomputing, vol. 430, pp. 174-184, 2021.
[26] U. Bhattacharya, T. Mittal, R. Chandra et al., "STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits," in AAAI Conference on Artificial Intelligence, New York, USA, Feb 7 - 12 2020, vol. 34, no. 02, pp. 1342-1350.
[27] Y. Li, Z. He, X. Ye et al., "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Dynamic Hand Gesture Recognition," EURASIP Journal on Image and Video Processing, vol. 2019, no. 1, pp. 1-7, 2019.
[28] D. Feng, Z. Wu, J. Zhang et al., "Multi-Scale Spatial Temporal Graph Neural Network for Skeleton-Based Action Recognition," IEEE Access, vol. 9, pp. 58256-58265, 2021.
[29] O. Keskes and R. Noumeir, "Vision-Based Fall Detection Using ST-GCN," IEEE Access, vol. 9, pp. 28224-28236, 2021.
[30] H. Duan, Y. Zhao, K. Chen et al., "Revisiting Skeleton-based Action Recognition," arXiv preprint arXiv:2104.13586, 2021.
[31] J. Xie, W. Xin, R. Liu et al., "Cross-Channel Graph Convolutional Networks for Skeleton-Based Action Recognition," IEEE Access, vol. 9, pp. 9055-9065, 2021.
[32] J. Cai, N. Jiang, X. Han et al., "JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition," in IEEE/CVF Winter Conference on Applications of Computer Vision, Jan 5 – 9 2021, pp. 2735-2744.
[33] X. Hao, J. Li, Y. Guo et al., "Hypergraph Neural Network for Skeleton-Based Action Recognition," IEEE Transactions on Image Processing, vol. 30, pp. 2263-2275, 2021.
[34] Y. Obinata and T. Yamamoto, "Temporal Extension Module for Skeleton-Based Action Recognition," in 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, Jan 10 - 15 2021: IEEE, pp. 534-540.
[35] K. Cheng, Y. Zhang, X. He et al., "Skeleton-Based Action Recognition With Shift Graph Convolutional Network," in IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 14 - 19 2020, pp. 183-192.
[36] L. Shi, Y. Zhang, J. Cheng et al., "Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks," IEEE Transactions on Image Processing, vol. 29, pp. 9532-9545, 2020.
[37] L. Shi, Y. Zhang, J. Cheng et al., "Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition," in IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, Jun 15 - 21 2019, pp. 12026-12035.
[38] C. Si, W. Chen, W. Wang et al., "Convolutional LSTM Network for Skeleton-Based Action Recognition," in IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, Jun 15 - 21 2019, pp. 1227-1236.
[39] D. Liang, G. Fan, G. Lin et al., "Three-Stream Convolutional Neural Network With Multi-Task and Ensemble Learning for 3D Action Recognition," in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, USA, Jun 16 - 17 2019.
[40] C. Reining, F. Niemann, F. Moya Rueda et al., "Human Activity Recognition for Production and Logistics—A Systematic Literature Review," Information, vol. 10, no. 8, p. 245, 2019.
[41] Z. Cao, G. Hidalgo, T. Simon et al., "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 1, pp. 172-186, 2019.
[42] H.-S. Fang, S. Xie, Y.-W. Tai et al., "Rmpe: Regional multi-person pose estimation," in 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct 22 - 29 2017, pp. 2334-2343.
[43] K. Kim and Y. K. Cho, "Effective Inertial Sensor Quantity and Locations on a Body for Deep Learning-Based Worker's Motion Recognition," Automation in Construction, vol. 113, p. 103126, 2020.
[44] B. Settles, "Active Learning Literature Survey," 2009.
[45] Z. Zhou, J. Shin, L. Zhang et al., "Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally," in IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA., Jul 21 - 26 2017, pp. 7340-7351.
[46] P. Dube, B. Bhattacharjee, S. Huo et al., "Automatic Labeling of Data for Transfer Learning," nature, vol. 192255, 2019.
[47] H. Gammulle, T. Fernando, S. Denman et al., "Coupled Generative Adversarial Network for Continuous Fine-Grained Action Segmentation," in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019: IEEE, pp. 200-209.
[48] Z. Wang, Z. Gao, L. Wang et al., "Boundary-aware cascade networks for temporal action segmentation," in European Conference on Computer Vision, 2020: Springer, pp. 34-51.
[49] G. Hidalgo, Z. Cao, T. Simon et al. "OpenPose: Real-time Multi-person Keypoint Detection Library for Body, Face, Hands, and Foot Estimation." https://github.com/CMU-Perceptual-Computing-Lab/openpose (accessed Jun 03, 2021).
[50] Z. Cao, T. Simon, S.-E. Wei et al., "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," in IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 2017, pp. 7291-7299.
[51] T. Simon, H. Joo, I. Matthews et al., "Hand Keypoint Detection in Single Images using Multiview Bootstrapping," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 1145-1153.
[52] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
[53] G. Frank B and G. L. M, Classifying the elements of Work. Management and Administration, 1924.
[54] C.-L. Yang, W.-T. Li, and S.-C. Hsu, "Skeleton-based Hand Gesture Recognition for Assembly Line Operation," in 2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS), 2020: IEEE, pp. 1-6.
[55] W. Kay, J. Carreira, K. Simonyan et al., "The Kinetics Human Action Video Dataset.," arXiv preprint arXiv:1705.06950, 2017.
[56] M. Müller, "Dynamic Time Warping," Information retrieval for music and motion, pp. 69-84, 2007.
[57] J. Redmon, S. Divvala, R. Girshick et al., "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, Jun 27 - 30 2016, pp. 779-788.

電子全文(網際網路公開日期：20260811)

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	應用於組裝生產線作業之基於手部骨架動作辨識技術

無相關期刊

1.	運用時間相似矩陣所建立之熵訊號偵測於例外動作辨識之研究
2.	應用於組裝生產線作業之基於手部骨架動作辨識技術
3.	應用動態時間校正結合啟發式演算法解決生產線動作影像貼標問題
4.	智慧製造中異常檢測和製程監控的機器學習方法
5.	製造業在智慧製造下的AI影像動素分析-以 I 公司為例
6.	九十度彎角差動傳輸線之模轉換與共模雜訊抑制
7.	臨場氣相分析探討無陽極鋰金屬電池之介面反應與其副反應之抑制方法開發
8.	Amorphous cobalt-cerium binary metal oxides for highly efficient and selective oxidation of 5-hydroxymethylfurfural to 2,5-diformylfuran
9.	自動化光譜晶片波長校正及晶片性能量化之演算法研究與開發
10.	機器學習方法運用於電纜絕緣層射出成型品質預測之研究
11.	探索社群媒體之中文章與留言關聯性之數據分析框架
12.	Double Us Leather Company (Pte) Ltd 皮革個案公司的成功商業管理模式
13.	禽流感對熟食連鎖店的影響
14.	網路購物環境中背景音樂和購物任務對消費者反應之影響
15.	影音串流平台影片名稱聚類分析：以Netflix為例

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室