跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2024/12/11 13:35
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:莊進智
研究生(外文):Christopher Chuang
論文名稱:基於深度學習的手語識別和評分系統
論文名稱(外文):Deep Learning Based Sign Language Recognition and Scoring Systems
指導教授:張志勇張志勇引用關係郭經華郭經華引用關係
指導教授(外文):Chih-Yung ChangChin-Hwa Kuo
口試委員:廖文華武士戎石貴平蒯思齊
口試日期:2024-06-06
學位類別:博士
校院名稱:淡江大學
系所名稱:資訊工程學系博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:英文
論文頁數:55
中文關鍵詞:卷積型長短期記憶網絡深度學習孿生長短期記憶網絡手語識別
外文關鍵詞:ConvLSTMDeep LearningSiamese LSTMSign language recognition
DOI:10.6846/tku202400408
相關次數:
  • 被引用被引用:0
  • 點閱點閱:6
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
根據世界衛生組織(WHO)[1],全球超過5%的人口需要對他們的聽力障礙進行康復和援助。手語是聽障和聾啞社群的主要交流方式,但現有的手語識別和教學產品的效果有限。現有模型在識別手語的複雜語義上下文和細微手指動作方面面臨挑戰。本論文提出了一個手語教學和評分系統(STSS),結合了Siamese長短期記憶(LSTM)進行粗粒度分類和卷積LSTM(ConvLSTM)進行細粒度分類。Siamese LSTM分析經過時間和空間規範化預處理的關鍵點數據,快速計算樣本視頻與標準化視頻數據集之間的相似性。ConvLSTM對相似性結果高於某一門檻的數據點進行進一步分析。本論文所提出的STSS,與其他機制進行比較,在精確率、召回率和F1-Score方面均表現出色。
According to the World Health Organization (WHO) [1], over 5% of the global population requires assistance for their hearing loss. Sign language is the primary communication method for the deaf community, but recognition technologies are limited in their effectiveness. Existing models are challenged by the complex contextual relationships of sign language gestures and recognition of subtle finger movements. This dissertation proposed a Sign Language Teaching and Scoring System (STSS) which combines coarse-grained classification using Siamese Long Short-Term Memory (LSTM) and a fine-grained classification utilizing the Convolutional LSTM (ConvLSTM) model. The Siamese LSTM analyzes spatially and temporally normalized key point data, and quickly calculates the similarity between sample and standard sign language videos. It utilizes an adaptive contrastive loss function that dynamically adjusts according to similarity measures. The contrastive loss function helps the model focus on more challenging gestures that are very similar, but distinct. The ConvLSTM conducts further analysis on datapoints where similarity results rise above a certain threshold. The proposed STSS is then compared with other mechanisms, showing outperformance with respect to precision, recall, and F1-Score.
Outline
Outline IV
List of Figures VI
Chapter 1. Introduction 1
1.1 Research Goals 3
1.2 Organization of the Dissertation 5
Chapter 2. Related work 6
2.1 Machine Learning 6
2.1.1 Hidden Markov Model (HMM) 6
2.1.2 K-nearest neighbor (KNN) 7
2.1.3 Support Vector Machine (SVM) 8
2.2 Deep Learning 9
2.2.1 Convolutional Neural Network (CNN) 9
2.2.2 Graph Convolutional Network (GCN) 11
2.2.3 Long short-term memory (LSTM) 12
2.2.4 Hybrid networks 13
2.2.5 Principal Component Analysis Network (PCANet) 16
Chapter 3. Preliminary 18
3.1 MediaPipe for Key Point Recognition 18
3.2 Siamese Neural Network Architecture 20
3.3 Long Short-Term Memory (LSTM) Network 20
3.4 Convolutional LSTM (ConvLSTM) Network 22
Chapter 4. Notations, Assumptions, Problem Description 24
4.1 Notations and Assumptions 24
4.2 Problem Description 24
4.3 Objective 26
Chapter 5. The Proposed Sign Language Teaching and Scoring System (STSS) 29
5.1 Data Preprocessing 30
5.1.1 Input Video Segmentation 30
5.1.2 Key Point Extraction 32
5.1.3 Temporal Normalization 32
5.1.4 Spatial Normalization 33
5.2 Coarse-Grained Classification using Siamese LSTM Model 34
5.3 Fine-Grained Classification using ConvLSTM Model 37
5.4 Summary 40
Chapter 6. Performance Evaluation 41
6.1 Dataset 41
6.2 Simulation Results 42
6.3 Summary 51
Chapter 7. Conclusion and Future Work 52
References 53

List of Figures
Fig. 3.1. Key point coordinates extracted for each hand with Mediapipe 19
Fig. 3.2. Key point coordinates extracted for the face and upper body 19
Fig. 3.3. LSTM Cell Structure 21
Fig. 3.4. Convolutional kernel operations over an image 23
Fig. 5.1. The architecture of proposed STSS mechanism 29
Fig. 5.2. Input Video Segmentation process 31
Fig. 5.3. Architecture of Siamese LSTM 35
Fig. 5.4. Architecture of ConvLSTM 38
Fig. 6.1. Training set distribution 41
Fig. 6.2. Testing set distribution 42
Fig. 6.3. Impact of sampling frame interval on accuracy and average classification time 44
Fig. 6.4. Confusion Matrix of each sign language category 45
Fig. 6.5. Varying threshold and layer counts in relation to recall, precision, and F1-Score 47
Fig. 6.6. ROC Curves for proposed STSS and TAM models 48
Fig. 6.7. Comparison of the proposed STSS, ML-CNN, and TAM in terms of precision, recall, and F1-Score 50
References
[1]World Health Organization, “Deafness and Hearing Loss,” World Health Organization, https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss, 2024.
[2]Kudrinko, Karly, et al. "Wearable sensor-based sign language recognition: A comprehensive review." IEEE Reviews in Biomedical Engineering, vol. 14, pp. 82-97, 2020.
[3]L. E. Baum and T. Petrie, “Statistical inference for probabilistic functions of finite state Markov chains,” Ann. Math. Statistics, vol. 37, no. 6, pp. 1554–1563, 1966.
[4]T. Starner, J. Weaver, and A. Pentland, “Real-time American sign language recognition using desk and wearable computer based video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 12, pp. 1371–1375, 1998.
[5]H.-L. Lou, “Implementing the Viterbi algorithm,” Proc. IEEE Signal Process. Mag., pp. 42–52, 1995.
[6]X. Liu et al., “3D skeletal gesture recognition via hidden states exploration,” IEEE Trans. Image Process., vol. 29, pp. 4583–4597, 2020.
[7]G. Fang, W. Gao, X. Chen, C. Wang, and J. Ma, ‘‘Signer-independent 843 continuous sign language recognition based on SRN/HMM,’’ Proc. Int. 844 Gesture Workshop, pp. 76–85, 2001.
[8]Rung-Huei Liang and Ming Ouhyoung, "A real-time continuous gesture recognition system for sign language," Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 558-567, 1998.
[9]N. Tubaiz, T. Shanableh, and K. Assaleh, ‘‘Glove-based continuous Arabic 865 sign language recognition in user-dependent mode,’’ IEEE Trans. Human- 866 Mach. Syst., vol. 45, no. 4, pp. 526–533, 2015.
[10]E. Alpaydin, Introduction to Machine Learning, MIT Press, 2010.
[11]J. Wu, L. Sun, and R. Jafari, “A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors,” IEEE J. Biomed. Heal. Informat., vol. 20, no. 5, pp. 1281–1290, 2016.
[12]W. Aly, S. Aly ,and S. Almotairi, ‘‘User-independent American Sign Language Alphabet Recognition based on Depth Image and PCANet features,’’ IEEE Access, vol. 7, pp. 123138–123150, 2019.
[13]Luqman, Hamzah. "An efficient two-stream network for isolated sign language recognition using accumulative video motion." IEEE Access, vol. 10, pp. 93785-93798, 2022.
[14]O. Koller, N. C. Camgoz, H. Ney, and R. Bowden, “Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 9, pp. 2306–2320, 2020.
[15]J. Forster, C. Schmidt, T. Hoyoux, O. Koller, U. Zelle, J. Piater, and H. Ney, “RWTH-PHOENIX-Weather: A large vocabulary sign language recognition and translation corpus,” Proc. Int. Conf. Language Resources Eval., pp. 3785–3789, 2012.
[16]Lin, K., Wang, X., Zhu, L., Zhang, B. and Yang, Y., “SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing,” IEEE Transactions on Multimedia, vol. 26, pp.4271-4280, 2024.
[17]J. Huang, W. Zhou, H. Li, and W. Li, ‘‘Attention-based 3D-CNNs for large-vocabulary sign language recognition,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 9, pp. 2822–2832, 2019.
[18]M. Al-Hammadi, G. Muhammad, W. Abdul, M. Alsulaiman, M. A. Bencherif and M. A. Mekhtiche, "Hand Gesture Recognition for Sign Language Using 3DCNN," IEEE Access, vol. 8, pp. 79491-79509, 2020.
[19]Wang, Zhibo, et al. "Hear sign language: A real-time end-to-end sign language recognition system," IEEE Transactions on Mobile Computing, vol. 21, no. 7, pp. 2398-2410, 2022.
[20]Bencherif, Mohamed A., et al., "Arabic sign language recognition system using 2D hands and body skeleton data," IEEE Access, vol. 9, pp. 59612-59627, 2021.
[21]Hao Zhou, Wengang Zhou, Yun Zhou, Houqiang Li, "Spatial-temporal multi-cue network for sign language recognition and translation," IEEE Transactions on Multimedia, vol. 24, pp. 768-779, 2021.
[22]A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Netw., vol. 18, no. 5–6, pp. 602–610, 2005.
[23]K. Cho et al., “Learning phrase representations using RNN encoder- decoder for statistical machine translation,” Proc. Conf. Empir. Meth- ods Nat. Lang. Process., pp. 1724–1734, 2014.
[24]B. Fang, J. Co, and M. Zhang, ‘‘DeepASL: Enabling ubiquitous and non- intrusive word and sentence-level sign language translation,’’ Proc. 15th ACM Conf. Embedded Netw. Sensor Syst., pp. 1–13, 2017.
[25]E. Rakun, A. M. Arymurthy, L. Y. Stefanus, A. F. Wicaksono, and I. W. W. Wisesa, ‘‘Recognition of sign language system for Indonesian language using long short-term memory neural networks,’’ Adv. Sci. Lett., vol. 24, no. 2, pp. 999–1004, 2018.
[26]N. Heidari, and Iosifidis, “Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition,” 25th IEEE International Conference on Pattern Recognition, Milan, Italy, pp. 7907-7914, 2021.
[27]G A. Prasath, and K. Annapurani, “Prediction of sign language recognition based on multi layered CNN,” Multimedia Tools and Applications, vol. 82, no. 19, pp. 29649-29669, 2023.
[28]Yang, Ti et al., “Articulated pose estimation with flexible mixtures-of-parts,” Proc. CVPR, pp.1385-1392, 2011.
[29]Basavarajaiah, Madhushree, “6 basic things to know about Convolution,” Medium.com, https://medium.com/@bdhuma/6-basic-things-to-know-about-convolution-daef5e1bc411, 2019.
[30]Pisa, Ivan et al., “Denoising Autoencoders and LSTM-Based Artificial Neural Networks Data Processing for Its Application to Internal Model Control in Industrial Environments—The Wastewater Treatment Plant Control Case,” Sensors, vol. 20, no. 13, p. 3743, 2020.
[31]Google AI,“MediaPipe Solutions Guide,” Google AI for Developers, https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker, 2024.
[32]Google AI,“MediaPipe Solutions Guide,” Google AI for Developers, https://ai.google.dev/edge/mediapipe/solutions/vision/pose_landmarker, 2024.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊