(35.175.212.130) 您好!臺灣時間:2021/05/18 03:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:紀柏廷
研究生(外文):Po-Ting Chi
論文名稱:基於特徵金字塔與三元損失組之單級人臉偵測與人臉辨識神經網路
論文名稱(外文):A Single-Stage Face Detection and Face Recognition Deep Neural Network Based on Feature Pyramid and Triplet Loss
指導教授:蔡宗漢蔡宗漢引用關係
指導教授(外文):Tsung-Han Tsai
學位類別:碩士
校院名稱:國立中央大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:57
中文關鍵詞:影像處理神經網路深度學習人臉偵測人臉辨識多任務學習
外文關鍵詞:Image ProcessingNeural NetworkDeep LearningFace DetectionFace RecognitionMulti-task Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:73
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著科技的發展,人工智慧的技術不斷演進,從1950年代開始的各種人工智慧哲學思想,到1980年代開始機器學習技術崛起,各式各樣的人工智慧技術像是決策樹(Decision Tree)、隨機森林(Random Forest),支持向量機(Support Vector Machine),神經網路(Neural Network)等等算法被提出並且經過了不斷地改良以加強其效能。再到近十年的深度學習演算法熱潮,配合GPU或其他捲積運算加速硬體的加速,深度神經網路(Deep Neural Network)在各種任務上都獲得了顯著性的改進。
實際上的人臉辨識系統,從影像鏡頭的輸入到身分的輸出,可區分為人臉偵測(Face Detection),人臉校正(Face Alignment),特徵擷取(Feature Extraction),特徵比對(Feature Matching)四個主要任務,這些任務如果都需要以原圖輸入會相當的耗費時間。在神經網路的優化下,已經可以將人臉偵測與人臉校正整合成人臉偵測網路,由特徵金字塔(Feature Pyramid)結合錨框(Anchor Box)來定位,由神經網路的回歸層(Regression Layer)進行校正。並將特徵擷取與特徵比對整合成人臉辨識網路,藉由捲積(Convolution)運算擷取特徵,透過全連接層(Fully Connect Layer)與Softmax函數進行比對。
本論文提出一個結合特徵金字塔與三元損失子(Triplet Loss)的多任務學習方式(Multi-task Learning)來訓練一單級的人臉偵測與人臉辨識深度神經網路,僅需一個主要的骨幹網路(Backbone Network)便可同時輸出各項任務的結果,透過分享捲積網路的權重來避免各項任務的重複運算。整個網路結合特徵金字塔與錨框進行定位,並輸出藉由三元損失子訓練的人臉特徵,最後使用一單純的數學函式進行相似度比對以取得人臉辨識結果。在Nvidia RTX 2080Ti的加速下,此系統輸入640x640解析度的圖片時可以達到212FPS的速度。
With the development of technology, the algorithm of artificial intelligence continues to evolve. From various artificial intelligence method has been proposed began in the 1950s, to the rise of machine learning algorithm in the 1980s. Various of artificial intelligence algorithm such as decision forests, support vector machines neural networks and other algorithms have been proposed and further imporved to enhance their performance. Eventually, with the exploding of deep learning algorithms in the past decade, by using the GPU or other accelerator hardware, deep neural networks have achieved significant improvements in various tasks.
A practical deep learning face recognition system can be divide into four main tasks: face detection, face alignment, feature extractor and feature matching. This task might be time-consuming if we execute each task with the original image as input data. Under the optimization of deep neural network, it is possible to integrate face detection task and face alignment task into a single detection network, localizing the face location by feature pyramid combined with anchor boxes and aligning the face position by training the regression layer of the neural network. After that, the feature extraction task and feature matching task can be combined by using convolution to extract the face feature and full connection layer with softmax function to match the person identification.
In this paper, we propose a multi-task training method based on feature pyramid and triplet loss to train a single-stage face detection and face recognition deep neural network. Every task’s data is pass through the same backbone network, in order to avoid the duplicate computation by sharing the weights and computations. The whole network are established using feature pyramid and anchor boxes to localize the face position, using triplet loss to establish the feature extractor and finally matching the feature through a simple math function. On a Nvidia 2080Ti GPU accelerator, this system can achieve 212 FPS for 640x640 resolution input.
摘要 I
ABSTRACT II
致謝 III
1. 序論 1
1.1. 研究背景與動機 1
1.2. 論文架構 4
2. 文獻探討 5
2.1. 人臉偵測 5
2.2. 人臉辨識 8
2.3. 多任務學習 10
2.4. 人臉切割任務 13
3. 網路模型設計 15
3.1. 人臉偵測任務資料集 15
3.2. 人臉辨識資料集 17
3.3. 人臉切割資料集 18
3.4. 切割模型選擇與結果 19
3.5. 整合網路所需之虛擬資料產生方式 21
3.6. 網路設計 22
4. 單級人臉辨識訓練策略設計與過程 26
4.1. 圖片前處理 26
4.2. 網路訓練參數 27
4.3. 訓練過程 28
4.4. 網路後處理 32
4.5. 訓練環境 33
5. 網路實現與結果討論 34
5.1. 人臉偵測驗證結果 34
5.2. 人臉辨識驗證結果 36
5.3. 系統速度 38
6. 結論 40
參考文獻 41
[1] T. Sakai, M. Nagao and Takeo Kanade, “Computer Analysis and Classification of Photographs of Human Faces”, Proceedings of Proc. First USA-JAPAN Computer Conference, pp. 55-62, January, 1972
[2] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, June 2005
[3] DG. Lowe.: “Object Recognition from Local Scale-Invariant Features.” Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20-25, 1999. pp.1150–1157
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation.” In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014
[5] R. B. Girshick, "Fast R-CNN," In International Conference on Computer Vision, 2015.
[6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. “SSD: Single shot multibox detector.” In ECCV, pages 21–37, 2016.
[7] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You Only Look Once: Unified, real-time object detection.” In IEEE Conference Computer Vision and Pattern Recognition (CVPR), 2016.
[8] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. “A convolutional neural network cascade for face detection.” In CVPR, pages 5325–5334, 2015.
[9] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol.23, no.10, pp.1499-1503, 2016.
[10] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, and S. Zafeiriou. “Retinaface: Single-stage dense face localisation in the wild.” arXiv preprint arXiv:1905.00641, 2019
[11] S. Yang, P. Luo, C. C. Loy and X. Tang, "WIDER FACE: A Face Detection Benchmark," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 5525-5533, doi: 10.1109/CVPR.2016.596.
[12] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. “Focal loss for dense object detection.” In ICCV, 2017
[13] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. “Deepface: Closing the gap to human-level performance in face verification.” In Conference on Computer Vision and Pattern Recognition, 2014
[14] G. B. Huang, M. Ramesh, T. Berg, and E. L. Miller. “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments.” TR of University of Massachusetts, Amherst, Oct, 2007.
[15] F. Schroff, D. Kalenichenko and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 815-823, doi: 10.1109/CVPR.2015.7298682.
[16] A. Dadashzadeh, A. T. Targhi, M. Tahmasbi, M. Mirmehdi, “HGR-Net: A Fusion Network for Hand Gesture Segmentation and Recognition,” arXiv:1806.05653, 2018.
[17] Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
[18] R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, “An all-in-one convolutional neural network for face analysis,” in Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on. IEEE, 2017, pp. 17–24.
[19] Z. Liao, P. Zhou, Q. Wu and B. Ni, "Uniface: A Unified Network for Face Detection and Recognition," 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, 2018, pp. 3531-3536, doi: 10.1109/ICPR.2018.8545051.
[20] J. Long, E. Shelhamer, and T. Darrell. “Fully convolutional networks for semantic segmentation,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431– 3440, 2015.
[21] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam. “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587, 2017.
[22] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv:1802.02611, 2018.
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in CVPR, 2016, pp. 2818–2826.
[24] V. Jain and E. Learned-Miller. “FDDB: a benchmark for face detection in unconstrained settings.” Technical Report UMCS-2010-009, University of Massachusetts, Amherst, 2010
[25] Q. Cao, L. Shen, W. Xie, O. M. Parkhi and A. Zisserman, "VGGFace2: A Dataset for Recognising Faces across Pose and Age," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, 2018, pp. 67-74, doi: 10.1109/FG.2018.00020.
[26] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.
[27] V. Badrinarayanan, A. Kendall and R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 1 Dec. 2017, doi: 10.1109/TPAMI.2016.2644615.
[28] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[29] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang and S. Z. Li, "S^3FD: Single Shot Scale-Invariant Face Detector," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 192-201, doi: 10.1109/ICCV.2017.30.
電子全文 電子全文(網際網路公開日期:20220801)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top