跳到主要內容

臺灣博碩士論文加值系統

(44.210.83.132) 您好!臺灣時間:2024/05/25 18:34
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:謝孟荃
研究生(外文):HSIEH, MENG-CHUAN
論文名稱:一種融合深度學習之強化式特徵參數設計於生物辨識應用之研究
論文名稱(外文):A Study on Enhanced Feature Parameter Designs Combined with Deep Learning for Biometric Recognition
指導教授:丁英智
指導教授(外文):DING, ING-JR
口試委員:楊振雄鄭旭志
口試委員(外文):YANG, CHEN-HSIUNGCHENG, HSU-CHIH
口試日期:2022-08-25
學位類別:碩士
校院名稱:國立虎尾科技大學
系所名稱:電機工程系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:123
中文關鍵詞:特徵強化生成對抗網路物件偵測卷積神經網路生物辨識
外文關鍵詞:Feature enhancementGenerative adversarial networkObject detectionConvolutional neural networkBiometric recognition
相關次數:
  • 被引用被引用:0
  • 點閱點閱:49
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
基於市面上較常見的生物辨識系統,大多數都是單模式辨識,例如臉部、語音、指紋、虹膜等等。雖然單一類特徵已經足夠準確,但是有人想要偽造是相對較容易的。對此本論文提出了一系列特徵強化方法予以強化生物辨識應用。特徵強化方法雖然會增加一些計算複雜度,但是此方法能有效的提升辨識系統的辨識準確率。本論文所發展特徵強化方法之實現的生物辨識應用主要為人員識別及手勢辨識等應用。
本研究所提出的特徵強化方法可分為三類。第一類方法為特徵融合設計的特徵參數強化,此方法是透過典型相關分析(Canonical Correlation Analysis, CCA)將兩種不同模式的特徵參數進行特徵融合,本研究利用多種特徵提取方法進行特徵融合研究。臉部三維特徵和LPCC語音特徵經特徵融合的強化後進行人員的身份識別;兩種手部動作特徵之YOLO 深度學習特徵和HOG特徵經特徵融合的強化後進行手部動作的辨識;臉部VGG16深度學習模型特徵和臉部DWT-DCT特徵經特徵融合的強化後進行人員識別。
第二類特徵強化方法為基於深度學習之可消除雜訊的特徵強化,此方法為將生成對抗網路(Generative Adversarial Network, GAN)運用於人臉及語音資料的雜訊消除。一般在錄製資料時,雜訊往往很難避免,以人臉和語音來說就是運動模糊和噪音干擾,本研究利用DeblurGAN和SEGAN來進行人臉圖像解模糊和說話語音的消除噪音處理。此類特徵強化方法主要運用於人員識別應用。
第三類特徵強化方法為可消除雜訊之快速手部特徵擷取的手部姿態辨識,此方法主要將手部姿態的圖像進行手指區段分割並檢測區段,本研究可快速識別10種手部姿態。
在本論文實驗中,對於特徵融合設計的特徵參數強化部份,臉部三維特徵和LPCC語音特徵融合的特徵可將單模式最好的LPCC語音特徵的70.88%之人員識別率提升至75.68%。在YOLO特徵和HOG特徵的融合部份,融合後的手部動作特徵可將單模式最好的HOG特徵的83.2%之手勢辨識率改善至90.8%。臉部VGG16深度學習模型特徵和臉部DWT-DCT特徵的融合則將單模式最好的DWT-DCT之91.95%的身份辨識率提升至94.6%。在基於深度學習之可消除雜訊設計的特徵強化的實驗,本研究將語音部份加入機械運轉的噪音後之人員辨識率為29.2%,在經過消除噪音處理後的辨識率提升至63.2%。在人臉圖像部份之經運動模糊處理後的人員辨識率為75.88%,經解模糊處裡後的辨識率上升至90.53%,其非常接近原始無雜訊圖像的92%,由此可見,消除雜訊在語音和人臉的人員識別上非常顯著。在可消除雜訊之快速手部特徵擷取的手部姿態辨識實驗,所設計的10種手部姿態的辨識率達到100%,而辨識一張手部圖像需要僅0.6秒鐘。

Based on the more common biometric identification systems on the market, most of them are single-mode identification, such as face, voice, fingerprint, iris, etc. Although a single class of features is accurate enough, it is relatively easy to fake. In this regard, this paper proposes a series of feature enhancement methods to enhance biometric recognition applications. Although the feature enhancement method will increase some computational complexity, it can effectively improve the recognition accuracy. In the thesis, proposed feature enhancement methods will focus on human identity recognition and hand gesture recognition.
The feature enhancement methods proposed in this study can be divided into three categories. The first category is the feature parameter enhancement of feature fusion design, it is a feature fusion of two different feature parameters through canonical correlation analysis (CCA). This research will use a variety of feature extraction methods for feature fusion research. Face features and LPCC voice features are enhanced for human identity recognition; two types of hand gesture features, YOLO feature and HOG feature, are enhanced for hand gesture recognition; face VGG16 deep learning model features and face DWT-DCT features are enhanced for human identity recognition.
The second category is feature enhancement that uses a deep learning model to eliminate noises. This thesis applies the concept of generative adversarial network (GAN) to eliminate noises of the data. Noise is often difficult to be avoided when recording data. In the case of images and speech, it is motion blur and noise interference. In this paper, DeblurGAN and SEGAN are used for image deblurring and speech noise removal.
The third category is fast hand pose recognition that can eliminate noise. The image of the hand gesture is divided into finger segments and then checked to quickly identify 10 kinds of hand poses.
In the experiment of this thesis, in the feature parameter enhancement designed by feature fusion, the three-dimensional facial features and LPCC speech features increased slightly from 70.88% single-mode recognition rate to 75.68% after feature fusion. The YOLO feature and HOG feature, the two hand motion features, increased from a single-mode recognition rate of 83.2% to 90.8% after feature fusion. Face VGG16 deep learning model features and face DWT-DCT features increased slightly from 91.95% in single mode to 94.6% after feature fusion. Based on the feature enhancement designed to eliminate noise by deep learning, the recognition rate is 29.2% after adding mechanical running noise to the voice part, and the recognition rate reaches 63.2% after noise reduction processing, and the image part is processed by motion blur. The post-recognition rate is 75.88%, and the post-deblur rate reaches 90.53%, which is very close to 92% of the original clear image. It can be seen that the effect of noise reduction research is very significant in the voice and image parts. The fast hand feature extraction that can eliminate noise is used for hand gesture recognition. The recognition rate of 10 kinds of hand gestures is estimated to reach 100%, and it takes about 0.6 seconds to recognize a hand image.

摘要...i
Abstract...ii
誌謝...iv
目錄...v
表目錄...vii
圖目錄...xii
第一章 緒論...1
1.1 研究動機...1
1.2 文獻回顧與探討...1
1.3 章節概要...2
第二章 研究方法...3
2.1 YOLO...3
2.2 生成對抗網路(Generative Adversarial Network, GAN)...5
2.3 典型相關分析(Canonical Correlation Analysis, CCA)...6
2.4 DWT-DCT...7
2.4.1 離散小波轉換(Discrete Wavelet Transform, DWT)...7
2.4.2 離散餘弦轉換(Discrete Cosine Transform, DCT)...8
2.5 卷積神經網路(Convolutional Neural Network, CNN)...10
2.6 動態時間校正(Dynamic Time Warping, DTW)...11
第三章 融合深度學習之強化式特徵參數設計於人類生物辨識應用...12
3.1 特徵融合設計的特徵參數強化...12
3.1.1 人臉及語音特徵之特徵融合設計的身份辨識...12
3.1.2 手部動作特徵之特徵融合設計的手勢辨識...15
3.1.2.1 基於深度學習的物件偵測...15
3.1.2.2 方向梯度直方圖(Histogram of oriented gradient, HOG)...18
3.1.3 深度學習模型特徵與訊號raw特徵之特徵融合設計的身份辨識...20
3.2 基於深度學習可消除雜訊設計的特徵強化...23
3.2.1 基於深度學習之語音強化...23
3.2.2 基於深度學習之影像解模糊...26
3.2.2.1 動態模糊(Motion Blur)...27
3.2.2.2 DeblurGAN...28
3.3 可消除雜訊之快速手部特徵擷取進行手部姿態辨識...29
3.3.1 手部輪廓偵測...29
3.3.2 尋找手心、手腕線和手指...31
3.3.3 手指偵測和識別手指...34
3.3.4 簡單手部姿態識別...34
第四章 實驗結果分析與比較...35
4.1 特徵融合之特徵強化實驗...35
4.1.1 人臉及語音特徵之特徵融合實驗...35
4.1.1.1 人臉及語音特徵資料庫建置...35
4.1.1.2 人臉及語音特徵之特徵強化實驗結果...40
4.1.2 手部動作特徵之特徵強化實驗...50
4.1.2.1 手部動作資料庫建置...52
4.1.2.2 手部動作特徵之特徵強化實驗結果...57
4.1.3 深度學習模型特徵與訊號raw特徵之特徵強化實驗...66
4.1.3.1 臉部圖像及語音頻譜圖資料庫建置...66
4.1.3.2 深度學習模型特徵與訊號raw特徵之特徵強化實驗結果...70
4.2 影像及語音之消除雜訊實驗...76
4.2.1 生成對抗網路模型訓練...76
4.2.2 消除雜訊實驗資料庫建置...79
4.2.3 消除雜訊實驗結果...80
4.3 可消除雜訊之快速手部特徵擷取實驗...96
第五章 結論...100
參考文獻...101
附錄...105
Extended Abstract...119


[1]A. Bochkovskiy, C. Wang and H. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” ArXiv, 2020, abs/2004.10934, https://arxiv.org/abs/2004.10934v1.
[2]J. Kim and J. Cho, “Exploring a Multimodal Mixture-Of-YOLOs Framework for Advanced Real-Time Object Detection,” Applied Sciences, vol. 10, no. 2, pp. 612, 2020.
[3]C. Dewi, R. C. Chen, Y. T. Liu, X. Jiang and K. D. Hartomo, “Yolo V4 for Advanced Traffic Sign Recognition With Synthetic Training Data Generated by Various GAN,” IEEE Access, vol. 9, pp. 97228-97242, 2021.
[4]G. Dhyanjith, N. Manohar and A. V. Raj, “Helmet Detection Using YOLO V3 And Single Shot Detector,” Proc. 2021 6th International Conference on Communication and Electronics Systems (ICCES), India, 2021, pp. 1844-1848.
[5]C. J. Lin and J. Y. Jhang, “Intelligent Traffic-Monitoring System Based on YOLO and Convolutional Fuzzy Neural Networks,” IEEE Access, vol. 10, pp. 14120-14133, 2022.
[6]W. Y. Hsu and W. Y. Lin, “Adaptive Fusion of Multi-Scale YOLO for Pedestrian Detection,” IEEE Access, vol. 9, pp. 110063-110073, 2021.
[7]W. Y. Hsu and W. Y. Lin, “Ratio-and-Scale-Aware YOLO for Pedestrian Detection,” IEEE Transactions on Image Processing, vol. 30, pp. 934-947, 2021.
[8]A. Sarda, S. Dixit and A. Bhan, “Object Detection for Autonomous Driving Using YOLO Algorithm,” Proc. 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM), United Kingdom, 2021, pp. 447-451.
[9]L. Zhou, W. Min, D. Lin, Q. Han and R. Liu, “Detecting Motion Blurred Vehicle Logo in IoV Using Filter-DeblurGAN and VL-YOLO,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 3604-3614, 2020.
[10]M. Mahendru and S. K. Dubey, “Real Time Object Detection with Audio Feedback using Yolo vs. Yolo_v3,” Proc. 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), India, 2021, pp. 734-740.
[11]S. Davanthapuram, X. Yu and J. Saniie, “Visually Impaired Indoor Navigation Using YOLO Based Object Recognition, Monocular Depth Estimation and Binaural Sounds,” Proc. 2021 IEEE International Conference on Electro Information Technology (EIT), MI USA, 2021, pp. 173-177.
[12]N. Bhuvaneshwary, M. Jayameenakshi, S. A. Lakshmi and K. Shrivalli, “People Detection And Identification Using Yolo,” Proc. 2021 5th International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), India, 2021, pp. 85-87.
[13]H. Aung, A. V. Bobkov and N. L. Tun, “Face Detection in Real Time Live Video Using Yolo Algorithm Based on Vgg16 Convolutional Neural Network,” Proc. 2021 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM), Russia, 2021, pp. 697-702.
[14]Meihua Wang, Hong Jiang and Ying Li, “Face Recognition Based on DWT/DCT and SVM,” Proc. 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), China, 2010, pp. V3-507-V3-510.
[15]V. Varshney, R. Gupta and P. Singh, “Hybrid DWT-DCT Based Method for Palm-Print Recognition,” Proc. 2014 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), India, 2014, pp. 000007-000012.
[16]Y. Pathak and S. Dehariya, “A More Secure Transmission of Medical Images by Two Label DWT and SVD Based Watermarking Technique,” Proc. 2014 International Conference on Advances in Engineering & Technology Research (ICAETR - 2014), India, 2014, pp. 1-5.
[17]B. El qacimy, M. Ait kerroum and A. Hammouch, “Handwritten Digit Recognition Based on DCT Features and SVM Classifier,” Proc. 2014 Second World Conference on Complex Systems (WCCS), Morocco, 2014, pp. 13-16.
[18]M. H. Teja, “Real-Time Live Face Detection Using Face Template Matching and DCT Energy Analysis,” Proc. 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR), proc. China, 2011, pp. 342-346.
[19]S. Du, M. Shehata and W. Badawy, “A Novel Algorithm for Illumination Invariant DCT-Based Face Recognition,” Proc. 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Canada, 2012, pp. 1-4.
[20]Q. Chen, K. Kotani, F. Lee and T. Ohmi, “An Improved Face Recognition Algorithm Using Quantized DCT Coefficients,” Proc. 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems, France, 2011, pp. 329-333.
[21]K. N. Krisandria, B. S. B. Dewantara and D. Pramadihanto, “HOG-Based Hand Gesture Recognition Using Kinect,” Proc. 2019 International Electronics Symposium (IES), Indonesia, 2019, pp. 254-259.
[22]S. A. Korkmaz, A. Akçiçek, H. Bínol and M. F. Korkmaz, “Recognition of The Stomach Cancer Images with Probabilistic HOG Feature Vector Histograms by Using HOG Features,” Proc. 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), Serbia, 2017, pp. 000339-000342.
[23]L. Weixing, S. Haijun, P. Feng, G. Qi and Q. Bin, “A Fast Pedestrian Detection Via Modified HOG Feature,” Proc. 2015 34th Chinese Control Conference (CCC), China, 2015, pp. 3870-3873.
[24]S. J. Monisha and G. M. Sheeba, “Gait Based Authentication with Hog Feature Extraction,” Proc. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), India, 2018, pp. 1478-1483.
[25]A. K. Seghouane and M. A. Qadar, “Sparsity Preserved Canonical Correlation Analysis,” Proc. 2020 IEEE International Conference on Image Processing (ICIP), United Arab Emirates, 2020, pp. 31-35.
[26]L. Gao and L. Guan, “A Discriminant Two-Dimensional Canonical Correlation Analysis,” Proc. 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Canada, 2019, pp. 1-4.
[27]鄭乃瑋,「一個考量彩色攝影機及深度攝影機之手勢影像感測融合的深度學習辨識研究」,國立虎尾科技大學電機工程學系碩士論文,2021。
[28]M. A. Imtiaz and G. Raja, “Isolated Word Automatic Speech Recognition (ASR) System Using MFCC, DTW & KNN,” Proc. 2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast), Indonesia, 2016, pp. 106-110.
[29]T. Muttaqi, S. H. Mousavinezhad and S. Mahamud, “User Identification System Using Biometrics Speaker Recognition by MFCC and DTW Along with Signal Processing Package,” Proc. 2018 IEEE International Conference on Electro/Information Technology (EIT), USA, 2018, pp. 0079-0083.
[30]T. K. Das, S. Misra, S. P. Choudhury, D. K. Sah, U. Baruah and R. H. Laskar, “Comparison of DTW Score and Warping Path for Text Dependent Speaker Verification System,” Proc. 2015 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2015], India, 2015, pp. 1-4.
[31]H. Chen, T. Ballal and T. Al-Naffouri, “A Decomposition Approach for Complex Gesture Recognition Using DTW and Prefix Tree,” Proc. 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Japan, 2019, pp. 876-877.
[32]S. S. Jambhale and A. Khaparde, “Gesture Recognition Using DTW & Piecewise DTW,” Proc. 2014 International Conference on Electronics and Communication Systems (ICECS), India, 2014, pp. 1-5.
[33]V. Ostankovich, R. Yagfarov, M. Rassabin and S. Gafurov, “Application of CycleGAN-Based Augmentation for Autonomous Driving at Night,” Proc. 2020 International Conference Nonlinearity, Information and Robotics (NIR), Russia, 2020, pp. 1-5.
[34]X. Shao, C. Wei, Y. Shen and Z. Wang, “Feature Enhancement Based on CycleGAN for Nighttime Vehicle Detection,” IEEE Access, vol. 9, pp. 849-859, 2021.
[35]Y. Nasu, K. Shinoda and S. Furui, “Cross-Channel Spectral Subtraction for Meeting Speech Recognition,” Proc. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Czech Republic, 2011, pp. 4812-4815.
[36]D. m. Zhang, C. c. Bao and F. Deng, “Integrating Codebook and Wiener Filtering for Speech Enhancement,” Proc. 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), China, 2015, pp. 1-5.
[37]R. Gomez and T. Kawahara, “Optimizing Spectral Subtraction and Wiener Filtering for Robust Speech Recognition in Reverberant and Noisy Conditions,” Proc. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, USA, pp. 4566-4569, 2010.
[38]P. Santiago, A. Bonafonte and J. Serra, “SEGAN: Speech Enhancement Generative Adversarial Network,” Proc. INTERSPEECH, 2017.
[39]M. Sakuma, Y. Sugiura and T. Shimamura, “Improvement of Noise Suppression Performance of SEGAN by Sparse Latent Vectors,” Proc. 2019 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Taipei, 2019, pp. 1-2.
[40]T. Spadini, G. S. i. Aldeia, G. Barreto, K. Alves, H. Ferreria, R. Suyama and K. N. Filho “On the Application of SEGAN for the Attenuation of the Ego-Noise in the Speech Sound Source Localization Problem,” Proc. 2019 Workshop on Communication Networks and Power Systems (WCNPS), Brazil, 2019, pp. 1-4.
[41]O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin and J. Matas, “DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks,” Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, USA, 2018, pp. 8183-8192.
[42]O. Kupyn, T. Martyniuk, J. Wu and Z. Wang, “DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better,” Proc. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 2019, pp. 8877-8886.
[43]K. J. Hyung, S. W. Cho, N. R. Baek, and K. R. Park. “Face and Body-Based Human Recognition by GAN-Based Blur Restoration” Sensors, vol. 20, no. 18, pp. 5229, 2020.
[44]M. Abdelmaksoud, E. Nabil, I. Farag and H. A. Hameed, “A Novel Neural Network Method for Face Recognition with a Single Sample Per Person,” IEEE Access, vol. 8, pp. 102212-102221, 2020.
[45]H. S. Yoon and K. R. Park, “CycleGAN-Based Deblurring for Gaze Tracking in Vehicle Environments,” IEEE Access, vol. 8, pp. 137418-137437, 2020.
[46]B. Lu, J. Chen and R. Chellappa, “UID-GAN: Unsupervised Image Deblurring Via Disentangled Representations,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 2, no. 1, pp. 26-39, Jan, 2020.
[47]H. Tomosada, T. Kudo, T. Fujisawa and M. Ikehara, “GAN-Based Image Deblurring Using DCT Loss With Customized Datasets,” IEEE Access, vol. 9, pp. 135224-135233, 2021.
[48]顏承宇,「運用視訊及音訊感測之非正常人顏面族群的發音照護系統設計研究」,國立虎尾科技大學電機工程學系碩士論文,2019。
[49]T. Grzejszczak, A. Galuszka and M. Kawulok, “Hand Landmarks Detection and Localization in Color Images,” Multimedia Tools and Applications volume, vol. 5, pp. 16363-16387, 2016.
[50]P. Kakumanu, S. Makrogiannis and N. Bourbakis, “A Survey of Skin-Color Modeling and Detection Methods,” Pattern Recognition, vol .40, no. 3, pp. 1106-1122,2007.
[51]Z. H. Chen, J. T. KIM, J. Liang, J. Zhang and Y. B. Yuan, “Real-Time Hand Gesture Recognition Using Finger Segmentation,” The Scientific World Journal, vol. 2014, Article ID 267872, 9 pages, 2014.
[52]J. Flusser, S. Farokhi, C. Höschl, T. Suk, B. Zitová and M. Pedone, “Recognition of Images Degraded by Gaussian Blur,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 790-806, Feb. 2016.

電子全文 電子全文(網際網路公開日期:20270920)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊