(3.238.173.209) 您好!臺灣時間:2021/05/17 10:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

: 
twitterline
研究生:諾哈金
研究生(外文):Noorkholis Luthfil Hakim
論文名稱:使用手勢狀態控制及深度學習的手勢追蹤方法
論文名稱(外文):FINGER TRACKING AND GESTURE RECOGNITION USING STATE-BASED CONTROL AND DEEP LEARNING
指導教授:施國琛施國琛引用關係
指導教授(外文):Timothy K. Shih
學位類別:博士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:96
中文關鍵詞:人機互動手勢手勢辨識手指偵測手指追蹤有限狀態機深度學習
外文關鍵詞:Human-Computer InteractionHand GestureHand Gesture RecognitionNeural NetworkDeep Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:94
  • 評分評分:
  • 下載下載:24
  • 收藏至我的研究室書目清單書目收藏:0
隨著時代演進,個人電腦到已發展到如今的智慧型手機裝置,人與電腦之間的溝通與交互也越來越重要。在多種形式的應用造成人們對於複雜應用之交互需求也越來越重視,進而發展出各式基於友善輸入之人機交互研究,而最常見的輸入方式既是使用手勢。因此,我們以局部及全局面觀來進行手勢辨識系統之相關研究。以局部層級之手指辨識為例,可進行指部追蹤與偵測,進而完成各項辨識挑戰,如:吉他撥弦演奏、布袋戲操偶及虛擬鍵盤打字,而上述這些手勢行為,可以透過有限狀態機之模型表示。透過結合傳統機於外觀辨識之手指追蹤方法,我們特別提出一個基於有限狀態機手勢辨識方法,並針對簡單的手勢範例做實驗,進行魯棒性之能力測試。在研究的實驗結果中,手勢辨識可以達到識別率的82%。 從全局的面觀來看,我們提出了在序列數據上使用3DCNN和LSTM進行基於深度學習的手勢識別的方法。在我們收集經由設計的數據集後,成功的在測試模型之階段,取得實時應用中的魯棒性。實驗結果證明,離線測試的準確率達到97%,實時應用程序的準確率達到92%。
Interaction between human and computer has become very important start from the first born of the personal computer to nowadays with smart phone devices. The demand of complex interaction in many form of application to be more natural lead the research on natural way of interaction design stood up. The common and most natural way to interact is using gesture. Thus in this work we study the gesture recognition system in local and global way. Local in the form of finger level gesture that connected to the finger detection and tracking. In this work, we are interested in solving the challenge of finger level gesture recognition on repeating-finite kind of gestures. For example guitar strumming, hand puppet actions, or virtual keyboard. This kind of gesture can be represented as the FSM model. By combining with fast but less accurate appearance-based method, we propose novel finger pose tracking using Finite State Model-based. To test the robustness of the proposed system we conduct the experiment on one simple repeating kind of gesture. The result able to reach 82% of recognition rate in the testing phase. The global way, we propose the deep learning based hand gesture recognition using 3DCNN and LSTM on the sequence data. We collected and design our own dataset to test the robustness of our model in real-time application. The result show that 97% accuracy rate on the offline testing then 92% accuracy on the real-time application.
摘要 I
ABSTRACT II
ACKNOWLEDGEMENTS III
CONTENTS IV
LIST OF FIGURES VI
LIST OF TABLES VIII
CHAPTER 1. INTRODUCTION 1
1.1 GENERAL INTRODUCTION 1
1.2 OBJECTIVE OF RESEARCH 2
1.3 SCOPE OF THE STUDY 3
1.4 DISSERTATION OUTLINE 4
CHAPTER 2. LITERATURE REVIEW 5
2.1 HUMAN COMPUTER INTERACTION 5
2.1.1 Gestures 5
2.1.2 Hand Gestures 7
2.2 HAND GLOBAL INTERACTION 9
2.2.1 Type of Gestures 9
2.2.2 Hand Gesture Recognition 10
2.3 HAND LOCAL INTERACTION 13
2.3.1 Appearance based finger representation 13
2.3.2 Model based finger representation 13
2.4 FINITE STATE MACHINE 14
2.4.1 Finite State Machine for Hand gesture Recognition 14
2.5 ARTIFICIAL NEURAL NETWORK 15
2.6 DEEP NEURAL NETWORK 18
2.6.1 Convolutional Neural Network 18
2.6.2 Long short-term Memory Network 19
2.6.3 3D Convolutional Neural Network 22
2.7 DEEP NEURAL NETWORK IN HAND GESTURE RECOGNITION 23
CHAPTER 3: HAND POSE TRACKING WITH USING FSM AS THE RESTRICTED MODEL 24
3.1 INTRODUCTION 24
3.2 ARCHITECTURE AND FINGER EXTRACTION 26
3.2.1 Hand Segmentation and Arm Removal 27
3.2.2 3D Finger Detection 28
3.2.3 3D Finger Tracking 30
3.2.4 Smoothing the Finger 31
3.3 DATA PREPARATION 32
3.3.1 Hand Sample Data Collections 32
3.3.2 Data Normalization 33
3.3.3 Clustering the Finger 34
3.4 MAIN METHODOLOGY 36
3.4.1 FSM-FT Builder 36
3.4.2 FSM-FT Runner 37
CHAPTER 4: STATIC AND DYNAMIC HAND GESTURE RECOGNITION USING COMBINATION OF 3DCNN AND LSTM WITH FSM CONTROL MODEL FOR REUSABLE GESTURES 44
4.1 INTRODUCTION 44
4.2 GESTURE DESIGN 45
4.3 DATA COLLECTION 48
4.4 DATA PREPROCESSING 50
4.5 PROPOSED MODEL 53
4.5.1 Multimodal Input Model 57
4.5.2 Context-Aware Neural Network 59
4.6 FSM CONTROLLER MODEL 60
CHAPTER 5. RESULT AND DISCUSSION 63
5.1 FSM-FT RESULT AND DISCUSSION 63
5.1.1 Experimental Setup 63
5.1.2 First Experimets Result 64
5.1.3 Second Experimets Result 65
5.1.4 Comparison with existing method 66
5.2 HAND GESTURE RECOGNITION DEEP LEARNING-BASED RESULT AND DISCUSSION 67
5.2.1 Data Sequence Augmentation 67
5.2.2 Training and Validating Strategy 68
5.2.3 Experimental Setup 68
5.2.4 Comparison of Input Data Result 69
5.2.5 Comparison of Multimodal Input Data Result 71
5.2.6 Real-Time Experimental Result 72
5.2.7 Comparison with Other Works 73
CHAPTER 6. CONCLUSION AND FUTURE WORKS 75
6.1 CONCLUSIONS 75
6.2 LIMITATIONS AND SUGGESTION FOR THE FUTURE WORKS 76
REFERENCE 77
[1] D. J. Sturman and D. Zeltzer, "A survey of glove-based input," IEEE Computer graphics and Applications, vol. 14, pp. 30-39, 1994.
[2] R. Y. Wang and J. Popovi{\'c}, "Real-time hand-tracking with a color glove," ACM transactions on graphics (TOG), vol. 28, pp. 1-8, 2009.
[3] P. Garg, N. Aggarwal and S. Sofat, "Vision based hand gesture recognition," World Academy of Science, Engineering and Technology, vol. 49, pp. 972-977, 2009.
[4] Z. Ren, J. Meng and J. Yuan, "Depth camera based hand gesture recognition and its applications in human-computer-interaction," in 2011 8th International Conference on Information, Communications \& Signal Processing, 2011.
[5] C. Yang, Y. Jang, J. Beh, D. Han and H. Ko, "Gesture recognition using depth-based hand tracking for contactless controller application," in 2012 IEEE International Conference on Consumer Electronics (ICCE), 2012.
[6] H. Birk and T. B. Moeslund, Recognizing gestures from the hand alphabet using principal component analysis, Aalborg Universitet, 1996.
[7] S. Mitra and T. Acharya, "Gesture recognition: A survey," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, pp. 311-324, 2007.
[8] F. C. M. Kjeldsen, "Visual interpretation of hand gestures as a practical interface modality," 1997.
[9] V. D. Shet, V. S. N. Prasad, A. M. Elgammal, Y. Yacoob and L. S. Davis, "Multi-Cue Exemplar-Based Nonparametric Model for Gesture Recognition.," in ICVGIP, 2004.
[10] H. Winnem{\"o}ller, "Practical gesture recognition for controlling virtual environments," Project for Bachelor of Science (Honours) of Rhodes University, 1999.
[11] C. L. Nehaniv, K. Dautenhahn, J. Kubacki, M. Haegele, C. Parlitz and R. Alami, "A methodological approach relating the classification of gesture to identification of human intent in the context of human-robot interaction," in ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005., 2005.
[12] C. Cadoz and M. Wanderley, "Gesture-music: Trends in gestural control of music," Paris: IRCAM, 2001.
[13] M. J. Lyons, J. Budynek and S. Akamatsu, "Automatic classification of single facial images," IEEE transactions on pattern analysis and machine intelligence, vol. 21, pp. 1357-1362, 1999.
[14] Y. Wu and T. S. Huang, "Hand modeling, analysis and recognition," IEEE Signal Processing Magazine, vol. 18, pp. 51-60, 2001.
[15] T. Starner and A. Pentland, "Real-time american sign language recognition from video using hidden markov models," in Motion-based recognition, Springer, 1997, pp. 227-243.
[16] J. Schlenzig, E. Hunter and R. Jain, "Recursive spatio-temporal analysis: Understanding gestures," in Technical report, Visual Computing Laboratory, University of San Diego, 1995.
[17] A. Katkere, E. Hunter, D. Kuramura, J. Schlenzig, S. Moezzi and R. Jain, "Robogest: Telepresence using hand gestures," Visual Computing Lab, California Univ., San Diego, Tech. Rep. VCL-94-104, 1994.
[18] H. Grant and C.-K. Lai, "Simulation modeling with artificial reality technology (SMART): an integration of virtual reality and simulation modeling," in 1998 Winter Simulation Conference. Proceedings (Cat. No. 98CH36274), 1998.
[19] Z.-m. Guo and others, "Research of hand positioning and gesture recognition based on binocular vision," in 2011 IEEE International Symposium on VR Innovation, 2011.
[20] G. Fang, W. Gao and D. Zhao, "Large vocabulary sign language recognition based on hierarchical decision trees," in Proceedings of the 5th international conference on Multimodal interfaces, 2003.
[21] R. Bowden12, A. Zisserman, T. Kadir and M. Brady, "Vision based interpretation of natural sign languages," researchgate, 2003.
[22] H. Cheng, L. Yang and Z. Liu, "Survey on 3D hand gesture recognition," IEEE transactions on circuits and systems for video technology, vol. 26, pp. 1659-1673, 2015.
[23] J. Nagi, F. Ducatelle, G. A. Di Caro, D. Cire{\c{s}}an, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber and L. M. Gambardella, "Max-pooling convolutional neural networks for vision-based hand gesture recognition," in 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011.
[24] M. V. Lamar, "Hand Gesture Recognition using T-CombNET-A Neural Network Model dedicated to Temporal Information Processing," 2001.
[25] D. McNeill, Hand and mind: What gestures reveal about thought, University of Chicago press, 1992.
[26] B. Boulay, "Human posture recognition for behaviour understanding," 2007.
[27] L. Bretzner, I. Laptev and T. Lindeberg, "Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering," in Proceedings of fifth IEEE international conference on automatic face gesture recognition, 2002.
[28] A. Birdal and R. Hassanpour, "Region based hand gesture recognition," 2008.
[29] S. X. Ju, M. J. Black, S. Minneman and D. Kimber, "Analysis of gesture and action in technical talks for video indexing," in Proceedings of IEEE computer society conference on computer vision and pattern recognition, 1997.
[30] Q. Luo, X. Kong, G. Zeng and J. Fan, "Human action detection via boosted local motion histograms," Machine Vision and Applications, vol. 21, pp. 377-389, 2010.
[31] Z. Ren, J. Meng, J. Yuan and Z. Zhang, "Robust hand gesture recognition with kinect sensor," in Proceedings of the 19th ACM international conference on Multimedia, 2011.
[32] F. T. Cerezo, "3D hand and finger recognition using Kinect," Universidad de Granada (UGR), Spain, unpublished, 2011.
[33] I. Oikonomidis, N. Kyriazis and A. A. Argyros, "Efficient model-based 3D tracking of hand articulations using Kinect.," in BmVC, 2011.
[34] C. Qian, X. Sun, Y. Wei, X. Tang and J. Sun, "Realtime and robust hand tracking from depth," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
[35] S. Melax, L. Keselman and S. Orsten, "Dynamics based 3D skeletal hand tracking," in Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 2013.
[36] J. Davis and M. Shah, "Visual gesture recognition," IEE Proceedings-Vision, Image and Signal Processing, vol. 141, pp. 101-106, 1994.
[37] M. Yeasin and S. Chaudhuri, "Visual understanding of dynamic hand gestures," Pattern Recognition, vol. 33, pp. 1805-1817, 2000.
[38] P. Hong, M. Turk and T. S. Huang, "Gesture modeling and recognition using finite state machines," in Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), 2000.
[39] N. {. K{\i}l{\i}boz and U. G{\"u}d{\"u}kbay, "A hand gesture recognition technique for human--computer interaction," Journal of Visual Communication and Image Representation, vol. 28, pp. 97-104, 2015.
[40] N. Hou, H. Dong, Z. Wang, W. Ren and F. E. Alsaadi, "Non-fragile state estimation for discrete Markovian jumping neural networks," Neurocomputing, vol. 179, pp. 238-245, 2016.
[41] F. Yang, H. Dong, Z. Wang, W. Ren and F. E. Alsaadi, "A new approach to non-fragile state estimation for continuous neural networks with time-delays," Neurocomputing, vol. 197, pp. 205-211, 2016.
[42] Y. Yu, H. Dong, Z. Wang, W. Ren and F. E. Alsaadi, "Design of non-fragile state estimators for discrete time-delayed neural networks with parameter uncertainties," Neurocomputing, vol. 182, pp. 18-24, 2016.
[43] Y. Yuan and F. Sun, "Delay-dependent stability criteria for time-varying delay neural networks in the delta domain," Neurocomputing, vol. 125, pp. 17-21, 2014.
[44] J. Zhang, L. Ma and Y. Liu, "Passivity analysis for discrete-time neural networks with mixed time-delays and randomly occurring quantization effects," Neurocomputing, vol. 216, pp. 657-665, 2016.
[45] L. Fausett, Fundamentals of neural networks: architectures, algorithms, and applications, Prentice-Hall, Inc., 1994.
[46] O. K. Oyedotun and K. Dimililer, "Pattern recognition: invariance learning in convolutional auto encoder network," International Journal of Image, Graphics and Signal Processing, vol. 8, pp. 19-27, 2016.
[47] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278-2324, 1998.
[48] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein and others, "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, pp. 211-252, 2015.
[49] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[50] K. Greff, R. K. Srivastava, J. Koutn{\'\i}k, B. R. Steunebrink and J. Schmidhuber, "LSTM: A search space odyssey," IEEE transactions on neural networks and learning systems, vol. 28, pp. 2222-2232, 2016.
[51] H. Sak, A. W. Senior and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," 2014.
[52] W. Zaremba, I. Sutskever and O. Vinyals, "Recurrent neural network regularization," arXiv preprint arXiv:1409.2329, 2014.
[53] M.-T. Luong, I. Sutskever, Q. V. Le, O. Vinyals and W. Zaremba, "Addressing the rare word problem in neural machine translation," arXiv preprint arXiv:1410.8206, 2014.
[54] Y. Fan, Y. Qian, F.-L. Xie and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
[55] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[56] S. Hochreiter and J. Schmidhuber, "LSTM can solve hard long time lag problems," in Advances in neural information processing systems, 1997.
[57] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia and A. Baskurt, "Sequential deep learning for human action recognition," in International workshop on human behavior understanding, 2011.
[58] S. Ji, W. Xu, M. Yang and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, pp. 221-231, 2012.
[59] Z. Liu, C. Zhang and Y. Tian, "3D-based deep convolutional neural network for action recognition with depth sequences," Image and Vision Computing, vol. 55, pp. 93-100, 2016.
[60] G. Varol, I. Laptev and C. Schmid, "Long-term temporal convolutions for action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, pp. 1510-1517, 2017.
[61] D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, "Learning spatiotemporal features with 3d convolutional networks," in Proceedings of the IEEE international conference on computer vision, 2015.
[62] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, 2014.
[63] L. Wang, Y. Qiao and X. Tang, "Action recognition with trajectory-pooled deep-convolutional descriptors," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[64] C. Feichtenhofer, A. Pinz and R. P. Wildes, "Spatiotemporal residual networks for video action recognition. CoRR abs/1611.02155 (2016)," arXiv preprint arXiv:1611.02155, 2016.
[65] J. L. Elman, "Finding structure in time," Cognitive science, vol. 14, pp. 179-211, 1990.
[66] A. Shahroudy, J. Liu, T.-T. Ng and G. Wang, "Ntu rgb+ d: A large scale dataset for 3d human activity analysis," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[67] B. Singh, T. K. Marks, M. Jones, O. Tuzel and M. Shao, "A multi-stream bi-directional recurrent neural network for fine-grained action detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[68] L. Pigou, A. Van Den Oord, S. Dieleman, M. Van Herreweghe and J. Dambre, "Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video," International Journal of Computer Vision, vol. 126, pp. 430-439, 2018.
[69] Y. Du, W. Wang and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
[70] V. Veeriah, N. Zhuang and G.-J. Qi, "Differential recurrent neural networks for action recognition," in Proceedings of the IEEE international conference on computer vision, 2015.
[71] A. Krizhevsky, I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012.
[72] L. Pigou, S. Dieleman, P.-J. Kindermans and B. Schrauwen, "Sign language recognition using convolutional neural networks," in European Conference on Computer Vision, 2014.
[73] X. Cao and R. Balakrishnan, "VisionWand: interaction techniques for large displays using a passive wand tracked in 3D," in Proceedings of the 16th annual ACM symposium on User interface software and technology, 2003.
[74] M. Schultz, J. Gill, S. Zubairi, R. Huber and F. Gordin, "Bacterial contamination of computer keyboards in a teaching hospital," Infection Control \& Hospital Epidemiology, vol. 24, pp. 302-303, 2003.
[75] S. M. Goza, R. O. Ambrose, M. A. Diftler and I. M. Spain, "Telepresence control of the NASA/DARPA robonaut on a mobility platform," in Proceedings of the SIGCHI conference on Human factors in computing systems, 2004.
[76] S. Lenman, L. Bretzner and B. Thuresson, "Using marking menus to develop command sets for computer vision based hand gesture interfaces," in Proceedings of the second Nordic conference on Human-computer interaction, 2002.
[77] J. Zhang, H. Lin and M. Zhao, "A fast algorithm for hand gesture recognition using relief," in 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009.
[78] A. K. Bourke, J. V. O’brien and G. M. Lyons, "Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm," Gait \& posture, vol. 26, pp. 194-199, 2007.
[79] P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree and J. Kautz, "Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
[80] A. Sophian and D. Aini, "Fingertip detection using histogram of gradients and support vector machine," Indonesian Journal of Electrical Engineering and Computer Science, vol. 8, pp. 482-486, 2017.
[81] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2014.
[82] E. Ohn-Bar and M. M. Trivedi, "Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations," IEEE transactions on intelligent transportation systems, vol. 15, pp. 2368-2377, 2014.
[83] H. Wang, D. Oneata, J. Verbeek and C. Schmid, "A robust and efficient video representation for action recognition," International Journal of Computer Vision, vol. 119, pp. 219-238, 2016.
[84] P. M. X. Y. S. Gupta and K. K. S. T. J. Kautz, "Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks," in CVPR, 2016.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top