跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.50) 您好!臺灣時間:2026/03/16 00:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林民麒
研究生(外文):Min-Chi Lin
論文名稱:應用於智慧型自走載具之基於YOLO深度學習網路的人群人臉方向辨識技術設計與實現
論文名稱(外文):Design and Implementation of YOLO Deep Learning Network Based Crowd Facial Direction Detection Technology for Intelligent Self-propelled Vehicles
指導教授:范志鵬范志鵬引用關係
指導教授(外文):Chih-Peng Fan
口試委員:黃穎聰賴信志
口試委員(外文):Yin-Tsung HwangShin-Chi Lai
口試日期:2019-07-18
學位類別:碩士
校院名稱:國立中興大學
系所名稱:電機工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:58
中文關鍵詞:深度學習物件偵測YOLO人臉方向辨識
外文關鍵詞:deep learningobject detectionYOLOface direction identification
相關次數:
  • 被引用被引用:4
  • 點閱點閱:884
  • 評分評分:
  • 下載下載:67
  • 收藏至我的研究室書目清單書目收藏:0
AI智能熱潮的興起,使得利用深度學習訓練模型來辨識物體並加以分類的情況越來越普遍,而硬體方面的提升也讓AI與嵌入式平台的結合越來越多,智慧型自走載具就是其中一項熱門應用。本系統能應用在智慧型自走載具上,將鏡頭拍攝到的影像即時分析,辨識出前方的行人和其面部方向。

本論文為了實現人臉方向辨識之深度學學模型,從三個公開資料庫中篩選出行人的圖片並對人臉進行標註,且為了增強鏡像類別的訓練,對其做水平翻轉以擴充資料庫。在模型架構方面,採用darknet深度學習架構,並以YOLOv2和tiny YOLOv3作為模型架構並進行延伸與比較,模型以RGB三通道圖片作為輸入,透過大量的卷積與池化層來擷取特徵,再將特徵圖進行計算後得到標註框與分類的預測。

實驗結果表示將類別分類換成logistic分類器後,成功的改善了鏡像類別誤判的問題,而在相同參數設置下,YOLOv2_logistic在webcam測試集上有最高的recall、precision與mAP,分別為85%、81%、86.28%,皆超過了80%;而略次之的則是tiny YOLOv3 (two scale),其recall、precision、mAP為81%、80%、78.83%,這是因為YOLOv2_logistic的卷積層數量多於tiny YOLOv3導致的。但也因為卷積層較多,在fps的測試中,表現最好的是tiny YOLOv3類型,在Xavier上皆能達到30,TX2上都有20以上的成績;而YOLOv2類型在Xavier上只有20,在TX2上甚至只有約7.5。另外在多尺度檢測方面,使用越多尺度進行檢測可以使loss值的收斂速度更快。

整體表現最好的模型架構是tiny YOLOv3 (two scale),雖各項結果稍微低於YOLOv2_logistic,但fps的結果卻大幅領先,非常適合嵌入式平台使用,且還能維持相當高的準確度,未來非常有機會應用在智慧自走載具上。
The rise of Artificial Intelligence makes it more and more common to use the deep learning training model to identify and classify objects, and the improvement of hardware also makes the combination of AI and embedded platforms more and more. Intelligent self-propelled vehicles are one of the popular applications. Our system can be applied to intelligent self-propelled vehicles to instantly analyze the images captured by the camera to identify the pedestrians in front and their facial directions.
In order to realize the deep learning model of face direction identification, our thesis selects the pictures of pedestrians from three public databases and marks the faces, and in order to enhance the training of mirror categories, horizontally flip them to expand the database. In terms of model architecture, the darknet deep learning architecture is adopted, and YOLOv2 and tiny YOLOv3 are used as the model architecture for extension and comparison. The model takes RGB three-channel image as input, extracts features through a large number of convolution and pooling layers. Then, the feature map will be calculated to obtain the prediction of the bounding box and the classification.
The experimental results show that after class classification is replaced by logistic classifier, the problem of false judgment of mirror class is successfully improved. Under the same parameter setting, YOLOv2_logistic has the highest recall, precision and mAP on the webcam test set, respectively, which is 85%. 81%, 86.28%, all exceeded 80%; and the second is tiny YOLOv3 (two scale), its recall, precision, mAP is 81%, 80%, 78.83%, because the number of convolution layers of YOLOv2_logistic More than tiny YOLOv3 caused. But also because of the large number of convolutional layers, in the fps test, the best performance is the tiny YOLOv3 type, which can reach 30 on Xavier and more than 20 on TX2; while the YOLOv2 type has only 20 on Xavier. Even on TX2 is only about 7.5.In addition, in multi-scale detection, the more scales are used for detection, the faster the convergence of the loss value can be achieved.
The best overall model architecture is tiny YOLOv3 (two scale). Although the results are slightly lower than YOLOv2_logistic, the results of fps are far ahead, which is very suitable for embedded platforms, and can maintain a high degree of accuracy. This system has a great opportunity to be applied to smart self-propelled vehicles in the future.
致謝 i
論文摘要 ii
Abstract iii
目錄 v
圖目錄 vii
表目錄 ix
1. 緒論 1
1.1. 研究動機與目的 1
1.2. 人臉方向辨識的挑戰 2
1.3. 論文架構 3
2. 文獻探討 4
2.1. 色彩空間 4
2.2. 圖片濾波及特徵空間 5
2.3. 各種物件偵測之深度學習模型 6
2.4. 基於區域的卷積神經網路R-CNN ( Region with CNN ) 7
3. 預備知識 9
3.1. 人工智慧AI ( Artificial Intelligence ) 9
3.2. 機器學習 ( Machine Learning ) 9
3.3. 深度學習 ( Deep Learning ) 9
3.4. 神經網路 ( Neural Network ) 10
3.5. 反向傳播BP ( Backpropagation ) 11
3.6. 卷積神經網路 ( Convolutional Neural Network ) 12
3.7. 卷積層 ( Convolution Layer ) 14
3.8. Zero Padding 14
3.9. 池化層 ( Pooling Layer ) 15
3.10. 激活函數 ( Activation Function ) 16
3.11. 全連接層 ( Fully Connected Layer ) 18
3.12. YOLO 物件偵測架構介紹 ( You Only Look Once ) 18
3.13. YOLOv2 第二代模型架構介紹 20
3.14. 批量標準化 ( BN,Batch Normalization ) [12] 22
3.15. 先驗框 ( Anchor Box ) [12] 22
3.16. 多尺度訓練 ( Multi-Scale Training ) [12] 24
3.17. tiny YOLOv3 模型架構介紹 24
4. 演算法實作流程 27
4.1. 資料蒐集與前處理 27
4.1.1. 資料集 ( dataset ) 27
4.1.2. 人臉方向定義 ( definition of face direction ) 28
4.1.3. 標註 ( label ) 28
4.1.4. 標註檔前處理 ( annotation preprocessing ) 30
4.1.5. 資料擴充—水平翻轉 ( Flip ) 31
4.1.6. 資料集分割 ( dataset segmentation ) 32
4.1.7. 測試用網路攝影機圖片資料集 ( webcam picture dataset for testing ) 32
4.2. 建立模型架構檔 33
4.3. 模型訓練流程 34
4.4. FD model演算法流程 35
4.5. 不同層數、架構、類別分類器對模型的影響 40
4.6. 模型效能評估 44
4.6.1. 召回率&精確率 ( Recall and Precision ) 44
4.6.2. mAP ( mean Average Precision ) 45
5. 實驗結果與分析 46
5.1. 測試集分析 46
5.2. YOLOv2模型測試結果 46
5.3. tiny YOLOv3模型測試結果與比較 49
5.4. tiny YOLOv3多尺度檢測之測試結果與比較 51
5.5. YOLOv2_logistic之測試結果與比較 53
5.6. 所有模型之fps比較 55
6. 結論與未來展望 56
6.1. 結論 56
6.2. 未來展望 56
參考文獻 57
[1]“AI, ML, Deep Learning關係” [Online]. Available: https://nccuclement.wordpress.com/2017/07/11/ai-ml-deep-learning/
[2]“機器學習-神經網路(多層感知機 Multilayer perceptron, MLP) 含倒傳遞( Backward propagation)詳細推導”
[Online]. Available: https://reurl.cc/plLDe
[3]“[資料分析&機器學習] 第5.1講: 卷積神經網絡介紹(Convolutional Neural Network)” [Online]. Available: https://reurl.cc/q7kdR
[4]“Sigmoid、Tanh、ReLu函數圖” [Online].
Available:https://www.julyedu.com/question/big/kp_id/26/ques_id/1044
[5]“關於影像辨識,所有你應該知道的深度學習模型”
[Online].Available:https://medium.com/@syshen/%E7%89%A9%E9%AB%94%E5%81%B5%E6%B8%AC-object-detection-740096ec4540
[6]“自走式倉儲載具” [Online]. Available: http://www.digitimes.com.tw/iot/article.asp?cat=158&cat1=20&cat2=&id=0000524789_F0Q8OAXR7J5IWRLUJWTOA&social_share=y
[7]“HSL和HSV色彩空間” [Online]. Available:
https://zh.wikipedia.org/wiki/HSL%E5%92%8CHSV%E8%89%B2%E5%BD%A9%E7%A9%BA%E9%97%B4
[8]“方向梯度直方圖” [Online]. Available:
https://zh.wikipedia.org/wiki/%E6%96%B9%E5%90%91%E6%A2%AF%E5%BA%A6%E7%9B%B4%E6%96%B9%E5%9B%BE
[9]“特徵金字塔網路FPN” [Online]. Available:http://wiszhipu.com/?id=113
[10]“多尺度檢測” [Online]. Available:
https://bigdatafinance.tw/index.php/tech/coding/639-pytorch-yolo-v3
[11]“ROC曲線跟PR曲線” [Online]. Available:
https://www.cnblogs.com/Allen-rg/p/5821949.html
[12]Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
[13]Dalal, N. (2006). Finding people in images and videos (Doctoral dissertation, Institut National Polytechnique de Grenoble-INPG).
[14]Okubo, J., Sugandi, B., Kim, H., Tan, J. K., & Ishikawa, S. (2008, October). Face direction estimation based on eigenspace technique. In 2008 International Conference on Control, Automation and Systems (pp. 1264-1267). IEEE.
[15]WANG, Liming, et al. Object detection combining recognition and segmentation. In: Asian conference on computer vision. Springer, Berlin, Heidelberg, 2007. p. 189-199.
[16]DALAL, Navneet; TRIGGS, Bill. Histograms of oriented gradients for human detection. In: international Conference on computer vision & Pattern Recognition (CVPR'05). IEEE Computer Society, 2005. p. 886--893.
[17]Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
[18]Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
[19]ABE, Shinji; MORIMOTO, Masakazu; FUJII, Kensaku. Estimating face direction from wideview surveillance camera. In: 2010 World Automation Congress. IEEE, 2010. p. 1-6.
[20]“The PASCAL Visual Object Classes Homepage” [Online]. Available:http://host.robots.ox.ac.uk/pascal/VOC/
[21]“INRIA Person Dataset” [Online]. Available:http://pascal.inrialpes.fr/data/human/
[22]“Penn-Fudan Database for Pedestrian Detection and Segmentation” [Online]. Available: https://www.cis.upenn.edu/~jshi/ped_html/
[23]“github:darknet” [Online]. Available:https://github.com/pjreddie/darknet
[24]“feature visualiztion” [Online]. Available:
https://github.com/jing-vision/lightnet/tree/master/feature-viz
[25]“LabelImg” [Online]. Available: https://github.com/tzutalin/labelImg
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊