跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.82) 您好!臺灣時間:2024/12/10 20:07
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:維亞彥
研究生(外文):Axel Yann Velez
論文名稱:用於自然人機互動的可客製化手勢辨識系統設計
論文名稱(外文):Customizable Gesture Recognition System for Natural Human-Machine Interactions
指導教授:陳慶瀚陳慶瀚引用關係
指導教授(外文):Ching-Han Chen
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:英文
論文頁數:76
中文關鍵詞:電腦視覺辨識手勢
外文關鍵詞:Computer VisionRecognitionGestures
相關次數:
  • 被引用被引用:0
  • 點閱點閱:29
  • 評分評分:
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:0
本論文設計了一個手勢識別系統,可實現更自然的人機互動。該系統基於美國手語 (ASL) 字母的識別,並且追踪用戶手部的動作。它可以辨識靜態手勢(ASL 字母)、複合手勢(ASL 字母序列)以及動態手勢(ASL 字母與手部動作相結合)。我們還設計了對每個手勢的多種動 作支持,並提供使用者反饋。本系統的一個特色是允許用戶彈性添加自定義手勢和修改現 有手勢,通過簡單地將手勢與特定手部動作相結合或定義靜態手勢序列。本研究首先介紹 了開發此可定制手勢識別系統的動機,並確定了缺乏靈活性的現有系統所面臨的挑戰。隨 後,詳細描述了系統的設計和實現,採用了包括卷積神經網絡 (CNNs)、長短期記憶網絡 (LSTMs) 及動態時間規整 (DTW) 在內的先進機器學習技術。這些技術被整合成一個階層 式模組化的系統架構,能夠分類並辨識靜態、複合和動態手勢。系統實作階段包括資料的 收集,以及其預處理,包括手部標誌並轉化為可用於訓練模型的資料集。評估階段採用各 種指標(包括準確度、召回率和 F1 分數)來驗證系統所達到的高精度和強健性。最後, 我們探討了手勢的個人化和不同的人機互動方式,驗證了系統的易用性及其在真實世界中 的應用潛力。
This thesis presents a customizable hand gesture recognition system designed for natural human machine interactions. This system is based on the recognition of ASL (American Sign Language) letters as well as the tracking of the user’s hand movement. It can detect static signs (single ASL letter), composed gestures (sequence of ASL letters) and dynamic gestures (ASL letter combined with hand movement path). It is also designed to handle the various actions associated to them and providing feedback to the user. One of the key features of this system is its flexibility, allowing the user to add more gestures and easily modify existing ones by associating a sign and a movement path, or defining a sequence of static signs. The study begins with the motivations for developing a customizable gesture recognition system and outlines the challenges of existing systems that lack adaptability. It then details the design and implementation of the system, which leverages advanced machine learning techniques, including Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Dynamic Time Warping (DTW). These techniques are integrated into a modular framework, capable of distinguishing and recognizing the static, composed, and dynamic gestures. The implementation phase covers the data collection and creation, and the preprocessing pipeline, including the extraction of the hand landmarks and their transformation into usable data for training our models. The evaluation phase demonstrates the system’s high accuracy and robustness across various metrics, including accuracy, loss and F1-score. Finally, the gesture customization and the different human-machine interactions are addressed, demonstrating the ease of use of the system and its real-world applications.
Abstract ........................................................................................................................................... ii
Résumé ........................................................................................................................................... iii
摘要 ................................................................................................................................................. iv
Acknowledgements .......................................................................................................................... v
List of Figures .............................................................................................................................. viii
List of Tables ................................................................................................................................... x
1. Chapter 1. Introduction ........................................................................................................... 1
1.1 Background ................................................................................................................................. 1
1.2 Motivation ................................................................................................................................... 1
1.3 Objectives .................................................................................................................................... 1
1.4 Thesis Structure .......................................................................................................................... 2
2. Chapter 2: Literature review ................................................................................................... 3
2.1 Hand gesture recognition ........................................................................................................... 3
2.1.1 Sensor-based gesture recognition ............................................................................................................ 3
2.1.2 Vision-based gesture recognition ............................................................................................................ 4
2.2 Static Sign Recognition ............................................................................................................... 5
2.2.1 Traditional Computer Vision ................................................................................................................... 5
2.2.1 Convolutional Neural Networks .............................................................................................................. 6
2.2.2 Vision Transformers ................................................................................................................................ 6
2.3 Dynamic Gesture Recognition ................................................................................................... 7
2.3.1 Deep Learning ......................................................................................................................................... 7
2.3.2 Dynamic Time Warping .......................................................................................................................... 9
2.4 Synthesis .................................................................................................................................... 10
3. Chapter 3. System Design ..................................................................................................... 11
3.1 MIAT Methodology .................................................................................................................. 11
3.1.1 IDEF0 .................................................................................................................................................... 11
3.1.2 GRAFCET ............................................................................................................................................. 12
3.2 IDEF0 Modelization ................................................................................................................. 13
3.2.1 System Overview ................................................................................................................................... 13
3.2.2 A1: Hand detection ................................................................................................................................ 14
3.2.3 A2: Sign recognition .............................................................................................................................. 15
3.2.4 A3: Path recognition .............................................................................................................................. 16
3.2.5 A4: Gesture prediction ........................................................................................................................... 17
3.2.6 A5: System response ............................................................................................................................. 18
3.3 GRAFCET Modelization ......................................................................................................... 20
3.3.1 System overview .................................................................................................................................... 20
3.3.2 A1: Hand detection ................................................................................................................................ 21
3.3.3 A2: Sign recognition .............................................................................................................................. 22
3.3.4 A3: Path recognition .............................................................................................................................. 23
3.3.5 A4 Gesture identification ....................................................................................................................... 24
vii
3.3.6 A5: System response ............................................................................................................................. 26
4. Chapter 4: Implementation ................................................................................................... 28
4.1 Environment .............................................................................................................................. 28
4.2 Data Collection and Preprocessing ......................................................................................... 28
4.2.1 ASL sign dataset .................................................................................................................................... 28
4.2.2 Dynamic gestures dataset ...................................................................................................................... 30
4.3 Static Sign Detection ................................................................................................................. 32
4.3.1 Model Architecture ................................................................................................................................ 32
4.3.2 Training and Validation ......................................................................................................................... 33
4.4 Composed Sign Detection ......................................................................................................... 33
4.5 Dynamic Sign Detection ........................................................................................................... 35
4.5.1 Path Recognition Model Architecture ................................................................................................... 35
4.5.2 Training and Validation ......................................................................................................................... 36
4.5.3 Dynamic Time Warping extra-validation .............................................................................................. 36
4.6 Gesture segmentation ............................................................................................................... 37
4.6.1 Dealing with intentionality .................................................................................................................... 37
4.6.2 Intentionality model ............................................................................................................................... 38
4.6.3 Change points boosting ......................................................................................................................... 39
4.7 Systems integration ................................................................................................................... 41
5. Chapter 5. Results and interpretation ................................................................................... 42
5.1 Static sign recognition .............................................................................................................. 42
5.2 Dynamic sign recognition ......................................................................................................... 45
5.3 Gesture segmentation ............................................................................................................... 50
6. Chapter 6. Customization and Human-Machine Interactions ............................................ 56
6.1 Customization ........................................................................................................................... 56
6.1.1 Adding new hand movements ............................................................................................................... 56
6.1.2 Adding or modifying gestures ............................................................................................................... 57
6.2 HMI ............................................................................................................................................ 57
6.2.1 Action handling ..................................................................................................................................... 57
6.2.2 LLM ....................................................................................................................................................... 58
6.2.3 Voice synthesis and feedback ................................................................................................................ 58
7. Chapter 7: Conclusion ......................................................................................................... 59
7.1 Challenges and Limitations ..................................................................................................... 59
7.2 Future Work .............................................................................................................................. 59
7.3 Final thoughts ............................................................................................................................ 60
8. References .............................................................................................................................. 61
[1] J. Shin, A. Matsuoka, M. A. M. Hasan and A. Y. Srizon, "American Sign Language
Alphabet Recognition by Extracting Feature from Hand Pose Estimation," Sensors, vol. 21,
no. 17, 2021.
[2] M. Oudah, A. Al-Naji and J. Chahl, "Hand Gesture Recognition Based on Computer
Vision: A Review of Techniques," Journal of Imaging, vol. 6, no. 8, 2020.
[3] M. S. Amin, S. T. H. Rizvi, A. Mazzei and L. Anselma, "Assistive Data Glove for Isolated
Static Postures Recognition in American Sign Language Using Neural Network,"
Electronics, vol. 12, no. 8, 2023.
[4] M. Králik and M. Šuppa, "Waveglove: Transformer-Based Hand Gesture Recognition
Using Multiple Inertial Sensors," in 2021 29th European Signal Processing Conference
(EUSIPCO), 2021.
[5] S. Kang, H. Kim, C. Park, Y. Sim, S. Lee and Y. Jung, "sEMG-Based Hand Gesture
Recognition Using Binarized Neural Network," Sensors, vol. 23, no. 3, 2023.
[6] J. Wu, P. Ren, B. Song, R. Zhang, C. Zhao and X. Zhang, "Data glove-based gesture
recognition using CNN-BiLSTM model with attention mechanism," PLOS ONE, vol. 18,
no. 11, pp. 1-22, 11 2023.
[7] MTheiler, «HOG scikit-image AngelaMerkel, CC BY-SA 4.0,» 2019. [En ligne].
Available: https://commons.wikimedia.org/wiki/File:HOG_scikit
image_AngelaMerkel.jpeg.
[8] A. R. Lubis, S. Prayudani, Y. Fatmi, Al-Khowarizmi, Julham and Y. Y. Lase, "Detection of
HOG Features on Tuberculosis X-Ray Results Using SVM and KNN," in 2021 2nd
International Conference on Innovative and Creative Information Technology (ICITech),
2021.
[9] R. Dhiman, P. Luthra and N. T. Singh, "Different Categories of Feature Extraction and
Machine Learning Classification Models Used for Hand Gesture Recognition Systems: A
Review," in 2023 IEEE 8th International Conference for Convergence in Technology
(I2CT), 2023.
[10] Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference
(CVC), vol. 1, Springer International Publishing, 2020, pp. 128-144.
[11] S. Loussaief and A. Abdelkrim, "Deep learning vs. bag of features in machine learning for
image classification," in 018 International Conference on Advanced Systems and Electric
Technologies (IC_ASET), 2018.
[12] Aphex34, "Typical CNN, CC BY-SA 4.0," 2015. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Typical_cnn.png. [Accessed 2024].
[13] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale
Image Recognition," 2015. [Online]. [Accessed 2024].
[14] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition,"
2015. [Online]. [Accessed 2024].
[15] T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object
Detection," arXiv [cs.CV], 2018.
61
[16] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real
Time Object Detection’," 2016. [Online]. [Accessed 2024].
[17] A. Vaswani, N. Shazeer, N. Parmar and J. Uszkoreit, "Attention Is All You Need," 2023.
[Online].
[18] C. K. Tan, K. M. Lim, C. P. Lee, R. K. Y. Chang and A. Alqahtani, "SDViT: Stacking of
Distilled Vision Transformers for Hand Gesture Recognition," Applied Sciences, vol. 13,
no. 22, 2023.
[19] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn and X. Zhai, "An Image is
Worth 16x16 Words: Transformers for Image Recognition at Scale," 2021. [Online].
[Accessed 2024].
[20] O. Uparkar, J. Bharti, R. K. Pateriya, R. K. Gupta and A. Sharma, "Vision Transformer
Outperforms Deep Convolutional Neural Network-based Model in Classifying X-ray
Images," Procedia Computer Science, vol. 218, pp. 2338-2349, 2023.
[21] S. H. Lee, S. Lee and B. C. Song, "Vision Transformer for Small-Size Datasets," 2021.
[Online]. Available: https://arxiv.org/abs/2112.13492. [Accessed 2024].
[22] M. Sari, A. Moussaoui et A. Hadid, «Deep Learning Techniques for Colorectal Cancer
Detection: Convolutional Neural Networks vs Vision Transformers,» chez 2024 2nd
International Conference on Electrical Engineering and Automatic Control (ICEEAC),
2024.
[23] fdeloche, "Long Short-Term Memory, CC BY-SA 4.0," 2017. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Long_Short-Term_Memory.svg. [Accessed
2024].
[24] V. Sharma, M. Jaiswal, A. Sharma, S. Saini and R. Tomar, "Dynamic Two Hand Gesture
Recognition using CNN-LSTM based networks," in 2021 IEEE International Symposium
on Smart Electronic Systems (iSES), 2021.
[25] XantaCross, "Euclidean vs DTW, CC BY-SA 3.0," [Online]. Available:
https://commons.wikimedia.org/wiki/File:Euclidean_vs_DTW.jpg.
[26] W. Li, Z. Luo and X. Xi, "Movement Trajectory Recognition of Sign Language Based on
Optimized Dynamic Time Warping," Electronics, vol. 9, no. 9, 2020.
[27] S. Salvador and P. Chan, "FastDTW: Toward Accurate Dynamic Time Warping in Linear
Time and Space," KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80,
2004.
[28] C. H. Chen, M. Y. Lin and X. C. Guo, "High-level modeling and synthesis of smart sensor
networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61,
pp. 48-66, 2017.
[29] A. Presley and D. Liles, "The Use of IDEF0 for the Design and Specification of
Methodologies," 10 1998. [Online].
[30] M. Blanchard, P. Brard and J. P. Frachet, "A modeling Tool for the Specifications of a
Logical Automatic System: The Grafcet (France)," IFAC Proceedings Volumes, vol. 13, no.
11, p. 521–529, 1980.
[31] Ssire, "Graf7_02.png, CC BY-SA 3.0," 8 July 2004. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Graf7_02.png.
62
[32] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, M. Georg
and M. Grundmann, "MediaPipe: A Framework for Building Perception Pipelines," 2019.
[33] A. Nagarah, "ASL Alphabet [Dataset]," 2018. [Online]. Available:
https://www.kaggle.com/dsv/29550. [Accessed October 2023].
[34] Meta AI, "Introducing Meta Llama 3: The most capable openly available LLM to date,"
2024. [Online]. Available: https://ai.meta.com/blog/meta-llama-3/.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊