跳到主要內容

臺灣博碩士論文加值系統

(44.222.131.239) 您好!臺灣時間:2024/09/08 21:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蕭亦珊
研究生(外文):Hsiao,Yi-Shan
論文名稱:FPGA 加速 YOLOv7 於嵌入式系統之運算效能
論文名稱(外文):FPGA Acceleration of YOLOv7 Inferences on Embedded Systems
指導教授:陳依蓉陳依蓉引用關係
指導教授(外文):Chen,Yi-Jung
口試委員:石勝文劉一宇
口試委員(外文):Shih,Sheng-WenLiu,Yi-Yu
口試日期:2024-06-17
學位類別:碩士
校院名稱:國立暨南國際大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:81
中文關鍵詞:現場可程式化邏輯閘陣列嵌入式系統YOLOv7深度學習處理器
外文關鍵詞:Field Programmable Gate ArrayEmbedded SystemsYOLOv7Deep Learning Processing Unit
DOI:doi: 10.6837/ncnu202400162
相關次數:
  • 被引用被引用:0
  • 點閱點閱:46
  • 評分評分:
  • 下載下載:14
  • 收藏至我的研究室書目清單書目收藏:0
現場可程式化邏輯閘陣列 (FPGA) 可分為雲端(Cloud)和邊緣(Edge)兩種運算模式。雲端運算需要將資料傳輸到數據中心進行處理和分析,但隨著雲端設備和物聯網的發展,數據量急劇增加,導致資料延遲和傳輸過程中的資料遺失。相比之下,邊緣運算不需要將數據傳輸到數據中心,減少了網路頻寬和伺服器的負荷,提高了運算效率,特別適合嵌入式系統應用。
為了在 XILINX FPGA ZCU 102 上進行效能評估,首先需要建立 Vitis-AI 開發平台。本論文使用 YOLOv7-Tiny和YOLOv7模型進行效能評估,並使用不同型號的深度學習處理器 (DPU) 和不同尺寸的圖片進行測試。YOLOv7-Tiny 使用 COCO 資料集進行訓練,涵蓋多種物體類別。YOLOv7 使用 VOC 2017 資料集進行訓練。
實驗結果顯示,YOLOv7-Tiny的準確度達到54.6%,而YOLOv7的準確度達到73.8%。在即時偵測中,YOLOv7-Tiny 能達到 30 FPS,顯示出其在性能和速度方面的優異表現。然而,未經剪枝處理的原始YOLOv7執行速度較慢,僅能達到 23 FPS,難以滿足速度要求較高的應用場景。
為了提升YOLOv7的執行效率,對其進行了模型壓縮和剪枝處理。模型壓縮減少了參數數量,剪枝去除冗餘的神經元和連接,加快了推理速度。經過這些優化後,YOLOv7在雙執行緒下的執行速度可達 25 FPS,顯著提高了其在即時偵測中的應用潛力。


Field Programmable Gate Array(FPGA) can be classified into two computing modes: cloud and edge. Cloud computing involves transmitting data to data centers for processing and anal- ysis. However, with the advancement of cloud devices and IoT, the volume of data has sharply increased, leading to data latency and potential loss during transmission. In contrast, edge com- puting does not require data to be transmitted to data centers, reducing network bandwidth and server load, thereby enhancing computing efficiency, particularly suitable for embedded sys- tems applications.
To conduct performance evaluation on the XILINX FPGA ZCU 102, establishing the Vitis- AI development platform is first required. This study evaluates the performance using YOLOv7- Tiny and YOLOv7 models, testing with different models of Deep Learning Processing Unit(DPU) and varying image sizes. YOLOv7-Tiny is trained on the COCO dataset covering multiple ob- ject classes, while YOLOv7 is trained on the VOC 2017 dataset.
Experimental results show that YOLOv7-Tiny achieves an accuracy of 54.6%, while YOLOv7 achieves 73.8% accuracy. In real-time detection, YOLOv7-Tiny achieves 30 FPS, demonstrat- ing excellent performance in terms of speed and efficiency. However, the original, unpruned YOLOv7 runs slower, achieving only 23 Frames Per Second(FPS), which may not meet the speed requirements for high-demand applications.
To enhance the execution efficiency of YOLOv7, model compression and pruning were performed. Model compression reduces parameter count, while pruning removes redundant neurons and connections, accelerating inference speed. After these optimizations, YOLOv7 achieves an execution speed of 25 FPS under dual threads, significantly improving its potential for real-time detection applications.

目次
致謝詞 i
摘要 ii
Abstract iii
目 次 v
表 目 次 vii
圖 目 次 viii
第一章 研究背景與目的 1
1 研究動機與挑戰 1
2 相關文獻探討 2
3 論文架構 4
第二章 背景知識 5
1 神經網路的基本架構 5
1.1 卷積神經網路架構 (Convolutional neural network,CNN) 5
1.2 池化層 (Pooling layer) 7
1.3 全連接層 (Fully Connected layer) 9
1.4 神經網路訓練架構 10
1.5 過度擬合 (Overfitting) 11
1.6 物件偵測方式 11
1.7 效能衡量指標 15
2 YOLO 介紹 16
2.1 Faster R-CNN(faster Regions with CNN features) 17
2.2 YOLO(You Only Look Once) 17
2.3 YOLOv3 和 YOLOv4 18
2.4 YOLOv7 19
2.5 YOLOv7-Tiny 25
3 FPGA 平台 28
3.1 Zynq UltraScale+ MPSoC ZCU102 29
3.2 DPU Overlay 30
3.3 Vitis-AI 環境 34
第三章 文獻探討 37
1 深度神經網路設計中的效能優化方法 37
2 現場可程式邏輯閘陣列(Field Programmable Gate Arrays, FPGA) 38
第四章 實驗方法 41
1 邊緣運算平台 41
1.1 實驗設備 42
1.2 模型訓練 43
1.3 通道剪枝 (Channel Pruning) 50
1.4 模型部署方法 53
第五章 實驗數據分析 59
1 數據評估 59
1.1 偵測模型速度 59
1.2 運算效能分析 63
1.3 Deep Learning Processing Unit(DPU) 68
1.4 偵測成果展示 74
第六章 結論與未來展望 76
1 結論 76
2 未來展望 77
參考文獻 78

表目次
表2.1 混淆矩陣 (Confusion Martix) 15
表2.2 Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit 31
表4.1 硬體設備配置及軟體版本 44
表4.2 部屬相關工具版本 53
表5.1 DPU 卷積架構 70
表5.2 DPUCZDX8G Architecture B4096 運算效能分析 71
表5.3 DPUCZDX8G Architecture B3136 運算效能分析 71
表5.4 網路攝影機鏡頭相關表 74
表5.5 以WEBM影像格式和網路攝影鏡頭推論速度 75

圖目次
圖1.1 Object detection 1
圖2.1 前饋神經網路 (Feedforward Neural Network,FNN) 6
圖2.2 卷積神經網路 (Convolutional neural network,CNN) 架構 7
圖2.3 kernel 卷積運算 8
圖2.4 Padding 後進行卷積運算 8
圖2.5 RGB 三個 channel 對上 1 個 Kernel 卷積運算 9
圖2.6 Max Pooling and Average Pooling 10
圖2.7 扁平層與全連接層的連接 11
圖2.8 神經元連接運算 12
圖2.9 深度神經網路 (Deep Neural Network) 12
圖2.10 過度擬合 (Overfitting) 13
圖2.11 網格示意圖 14
圖2.12 Precision,recall and IoU 16
圖2.13 物件偵測演算法過程 17
圖2.14 grid cell 概念 18
圖2.15 改過的 Yolov7 架構 20
圖2.16 RepConv 與 RepConvN 差異 21
圖2.17 YOLOv7,ELAN 架構圖 22
圖2.18 YOLOv7,MP1 架構圖 22
圖2.19 Applying CSPNet to ResNe(X)t 23
圖2.20 YOLOv7,SPPCSP 架構圖 23
圖2.21 YOLOv7,ELAN6 架構圖 24
圖2.22 YOLOv7,RepConv 架構圖 24
圖2.23 改過的 Yolov7-Tiny 架構 26
圖2.24 YOLOv7-Tiny,ELAN 架構 27
圖2.25 YOLOv7-Tiny,SPPCSP 架構 27
圖2.26 FPGA 內部架構圖 28
圖2.27 CLB 內部架構圖 29
圖2.28 Zynq UltraScale+ MPSoC ZCU102 30
圖2.29 DPUCZDX8G Block Diagram 32
圖2.30 DPUCZDX8G 硬體架構 33
圖2.31 Vitis-AI 環境架構 34
圖2.32 Vitis AI Quantizer 35
圖2.33 Vitis AI Compiler 36
圖4.1 即時偵測系統 41
圖4.2 實驗設備 42
圖4.3 YOLOv7-Tiny-640 Precision-Recall curve 45
圖4.4 YOLOv7-Tiny-512 Precision-Recall curve 45
圖4.5 YOLOv7-640 Precision-Recall curve 46
圖4.6 YOLOv7-512 Precision-Recall curve 46
圖4.7 YOLOv7-Tiny-640 F1 Confidence curve 47
圖4.8 YOLOv7-Tiny-512 F1 Confidence curve 47
圖4.9 YOLOv7-640 F1 Confidence curve 48
圖4.10 YOLOv7-512 F1 Confidence curve 48
圖4.11 YOLOv7-Tiny 結果圖 49
圖4.12 YOLOv7 結果圖 49
圖4.13 YOLOv7-Pruning-512 Precision-Recall curve 50
圖4.14 YOLOv7-Pruning-512 F1 Confidence curve 51
圖4.15 YOLOv7-Pruning-640 Precision-Recall curve 51
圖4.16 YOLOv7-Pruning-640 F1 Confidence curve 52
圖4.17 Vitis-AI Docker 環境 53
圖4.18 模型部署流程圖 54
圖4.19 量化前後模型內容對比 55
圖4.20 量化前後檔案大小 55
圖4.21 量化模型和編譯模型 56
圖4.22 訓練和量化比較 58
圖5.1 筆記型電腦與嵌入式系統連接 60
圖5.2 模型執行速度 62
圖5.3 模型讀取圖片平均時間 65
圖5.4 模型讀取圖片和即時偵測花費時間 65
圖5.5 模型讀取圖片和即時偵測花費時間 66
圖5.6 模型讀取圖片和即時偵測 MemoryIO 66
圖5.7 模型讀取圖片和即時偵測 Memory Bandwidth 67
圖5.8 DPUCZDX8G 卷積架構 69
圖5.9 DPUCZDX8G Architecture Resources 72
圖5.10 DPU 整合系統 73
圖5.11 ZCU 102 即時偵測成果 75















[1] K. Potdar, C. D. Pai, and S. Akolkar, “A convolutional neural network based
live object recognition system as blind aid,” arXiv preprintarXiv:
1811.10399, 2018.
[2] S. Saha, “A comprehensive guide to convolutional neural networks—the eli5
way,” 2018.
[3] M. Yani, M. B. I. S, Si., and M. C. S. S.T., “Application of transfer
learning using convolutional neural network method for early detection of
terry's nail,” Journal of Physics: Conference Series, vol. 1201, no. 1, p.
012052, may 2019. [Online]. Available: https://dx.doi.org/10.1088/1742-
6596/1201/1/012052
[4] H. Bui, “From convolutional neural network to variational auto encoder,”
2020.
[5] K. Courses, “Overfitting and underfitting.”
[6] COCO. [Online]. Available: https://cocodataset.org/#home
[7] P. Babu and E. Parthasarathy, “Hardware acceleration for object detection
using yolov4 algorithm on xilinx zynq platform,” Journal of Real-Time Image
Processing, vol. 19, no. 5,pp. 931–940, 2022.
[8] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
Unified, real- time object detection,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2016, pp. 779–788.
[9] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-
style convnets great again,” 2021.
[10] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H.
Yeh, “Cspnet: A new backbone that can enhance learning capability of cnn,”
in Proceedings of the IEEE/ CVF conference on computer vision and pattern
recognition workshops, 2020, pp. 390– 391.
[11] XILINX, “Zynq ultrascale+ mpsoc zcu102 evaluation kit.” [Online].
Available: https://www.xilinx.com/products/boards-and-kits/ek-u1-zcu102-
g.html
[12] DPUCZDX8G, “Dpuczdx8g for zynq ultrascale+ mpsocs product guide (pg338).”
[13] XILINX, “Vitis ai user guide (ug1414).”
[14] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in
Proceedings of the IEEE conference on computer vision and pattern
recognition, 2017, pp. 7263–7271.
[15] T. M. G. Jocher, K. Nishimura and R. Vilariño, “Yolov5,” 2020. [Online].
Available: https://github.com/ultralytics/yolov5
[16] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
conference on com- puter vision, 2015, pp. 1440–1448.
[17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
object detec- tion with region proposal networks,” Advances in neural
information processing systems, vol. 28, 2015.
[18] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
Proceedings of the IEEE international conference on computer vision, 2017,
pp. 2961–2969.
[19] C. Wang and Z. Luo, “A review of the optimal design of neural networks
based on fpga,”Applied Sciences, vol. 12, no. 21, p. 10771, 2022.
[20] J. Mendez, K. Bierzynski, M. Cuéllar, and D. P. Morales, “Edge
intelligence: Concepts, architectures, applications, and future
directions,” ACM Transactions on Embedded Com- puting Systems (TECS), vol.
21, no. 5, pp. 1–41, 2022.
[21] Y. Tang, R. Dai, and Y. Xie, “Optimization of energy efficiency for fpga-
based convolu- tional neural networks accelerator,” Journal of Physics:
Conference Series, vol. 1487, p. 012028, 03 2020.
[22] F. Muslim, L. Ma, M. Roozmeh, and L. Lavagno, “Efficient fpga
implementation of opencl high-performance computing applications via high-
level synthesis,” IEEE Access, vol. PP,pp. 1–1, 02 2017.
[23] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,” in
Proceedings of the IEEE conference on computer vision and pattern
recognition, 2014, pp. 580–587.
[24] Redmon, Joseph and Farhadi, Ali, “Yolov3: An incremental improvement,”
arXiv preprint arXiv:1804.02767, 2018.
[25] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and
accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[26] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors,” 2022.
[27] C.-Y. Wang, H.-Y. M. Liao, and I.-H. Yeh, “Designing network design
strategies through gradient path analysis,” 2022.
[28] Y. Lee, J. won Hwang, S. Lee, Y. Bae, and J. Park, “An energy and gpu-
computation efficient backbone network for real-time object detection,”
2019.
[29] TensorFlow. [Online]. Available: https://www.tensorflow.org/?hl=zh-tw
[30] PyTorch. [Online]. Available: https://pytorch.org/
[31] ONNX. [Online]. Available: https://onnx.ai/
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” 2015.
[33] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Scaled-yolov4: Scaling
cross stage par- tial network,” in Proceedings of the IEEE/cvf conference
on computer vision and pattern recognition, 2021, pp. 13 029–13 038.
[34] H. Xu and Y. Wang, “Target detection network for underwater image based on
adaptive anchor frame and re-parameterization,” in Journal of Physics:
Conference Series, vol. 2363, no. 1. IOP Publishing, 2022, p. 012012.
[35] S. Zhang, J. Cao, Q. Zhang, Q. Zhang, Y. Zhang, and Y. Wang, “An fpga-based
reconfig- urable cnn accelerator for yolo,” in 2020 IEEE 3rd International
Conference on Electronics Technology (ICET). IEEE, 2020, pp. 74–78.
[36] P. Li and C. Che, “Mapping yolov4-tiny on fpga-based dnn accelerator by
using dynamic fixed-point method,” in 2021 12th International Symposium on
Parallel Architectures, Algorithms and Programming (PAAP), 2021, pp. 125–
129.
[37] S. Oh, J.-H. You, and Y.-K. Kim, “Implementation of compressed yolov3-tiny
on fpga- soc,” in 2020 IEEE International Conference on Consumer
Electronics - Asia (ICCE-Asia), 2020, pp. 1–4.
[38] VOC2012. [Online].Available:http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
[39] Y. He, X. Zhang, and J. Sun, “Channel pruning for accelerating very deep
neural networks,” 2017.
[40] Vitis-AI. [Online]. Available: https://github.com/Xilinx/Vitis-AI
[41] Docker. [Online]. Available: https://www.docker.com/
[42] Anaconda. [Online]. Available: https://www.anaconda.com/download-old
[43] XILINX, “Vitis ai library user guide(ug1354).”






































QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top