跳到主要內容

臺灣博碩士論文加值系統

(44.200.194.255) 您好!臺灣時間:2024/07/24 05:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:呂映承
研究生(外文):Lu, Ying-Cheng
論文名稱:應用模型壓縮技術並使用28奈米CMOS製程的23-FPS Tiny-YOLOv4物件辨識CNN晶片系統
論文名稱(外文):A 23-FPS Tiny-YOLOv4 Object Detection CNN Accelerator with Model Compression in a 28nm CMOS Technology
指導教授:闕河鳴闕河鳴引用關係
指導教授(外文):Chiueh, Her-Ming
口試委員:廖弘源鄭桂忠帥宏翰
口試委員(外文):Liao, Hong-YuanTang, Kea-TiongShuai, Hong-Han
口試日期:2022-01-17
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:英文
論文頁數:64
中文關鍵詞:YOLO物件辨識卷積神經網路硬體加速器
外文關鍵詞:YOLOObject DetectionConvolutional Neural NetworksHardware Accelerator
相關次數:
  • 被引用被引用:0
  • 點閱點閱:240
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
摘要...........................................................i
Abstract......................................................ii
Acknowledgements.............................................iii
Contents......................................................iv
List of Tables................................................vi
List of Figures..............................................vii
Chapter 1 Introduction.......................................1
1.1 Object Detection...........................................1
1.2 Related Works..............................................2
1.2.1 Object Detection CNN Models............................2
1.2.2 Optimization Techniques...............................10
1.2.3 Hardware Accelerators.................................16
1.3 Objectives................................................36
1.4 Thesis Organization.......................................37
Chapter 2 Proposed System...................................38
2.1 Hardware Accelerator Implementation.......................38
2.2 Dataflow and FPS Estimation of the Hardware Accelerator...41
2.3 Proposed Object Detection CNN Model.......................43
2.4 Summary of the Proposed System............................49
Chapter 3 Experimental Results and Verification Results.....50
3.1 Implementation Environment................................50
3.2 System Implementation and Simulation......................51
3.3 Verification Environment..................................55
3.4 System Verification and Demonstration on FPGA.............55
3.5 Comparison to Related Works...............................57
Chapter 4 Conclusion and Future Work..........................59
Reference.....................................................60
[1] Karen Simonyan, Andrew Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in International Conference on Learning Representations, 2015.
[2] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[3] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 580–587, 2014.
[4] Ross Girshick, “Fast R-CNN”, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 1440–1448, 2015.
[5] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks”, In Advances in Neural Information Processing Systems (NIPS), pages 91–99, 2015.
[6] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, “You only look once: Unified, real-time object detection”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
[7] Joseph Redmon and Ali Farhadi, “YOLO9000: better, faster, stronger”, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7263–7271, 2017.
[8] Joseph Redmon and Ali Farhadi, “YOLOv3: An incremental improvement”, arXiv preprint arXiv:1804.02767, 2018.
[9] Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934, 2020.
[10] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao, “Scaled-YOLOv4: Scaling Cross Stage Partial Network,” arXiv preprint arXiv:2011.08036, 2020.
[11] “Ultralytics/Yolov5,” Github. https://github.com/ultralytics/yolov5.
[12] Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao, “You Only Learn One Representation: Unified Network for Multiple Tasks,” arXiv preprint arXiv:2105.04206, 2021.
[13] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg, “SSD: Single shot multibox detector”, In Proceedings of the European Conference on Computer Vision (ECCV), pages 21–37, 2016.
[14] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, “Feature Pyramid Networks for Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117-2125
[15] S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, "Path Aggregation Network for Instance Segmentation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759-8768, doi: 10.1109/CVPR.2018.00913.
[16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, “Going Deeper with Convolutions,” arXiv preprint arXiv:1409.4842, 2014.
[17] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” arXiv preprint arXiv:2103.14030, 2021.
[18] Z. liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, B. Guo, ”Swin Transformer V2: Scaling Up Capacity and Resolution,” arXiv preprint arXov:2111.09883, 2021
[19] C. Wang, H. Mark Liao, Y. Wu, P. Chen, J. Hsieh and I. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 1571-1580, doi: 10.1109/CVPRW50498.2020.00203.
[20] K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916, 1 Sept. 2015, doi: 10.1109/TPAMI.2015.2389824.
[21] M. Tan, R. Pang and Q. V. Le, "EfficientDet: Scalable and Efficient Object Detection," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10778-10787, doi: 10.1109/CVPR42600.2020.01079.
[22] Mingxing Tan, Quoc Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6105-6114, 2019.
[23] Y. Lee, J. Hwang, S. Lee, Y. Bae and J. Park, "An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 752-760, doi: 10.1109/CVPRW.2019.00103.
[24] Robert J. Wang, Xiang Li, Charles X. Ling, “Pelee: A Real-Time Object Detection System on Mobile Devices,” Advances in Neural Information Processing Systems 31 (NeurIPS 2018)
[25] Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger, “Densely Connected Convolutional Networks,” 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261-2269, doi: 10.1109/CVPR.2017.243
[26] Z. Qin et al., "ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6717-6726, doi: 10.1109/ICCV.2019.00682.
[27] Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun, “Light-Head R-CNN: In Defense of Two-Stage Object Detector,” arXiv preprint arXiv:1711.07264, 2017
[28] Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun, “ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design,” arXiv preprint arXiv:1807.11164, 2018
[29] Ma, Yufei, et al., “Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks,” FPGA 17: Proceedings of the 2017 ACM/SIGDA International Symposium on-Field-Programmable
[30] Song Han, Huizi Mao, William J. Dally, ”Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” arXiv preprint arXiv:1510.00149, 2015.
[31] Cheng Yu, Wang Duo, Zhou Pan, and Zhang Tao, “A Survey of Model Compression and Acceleration for Deep Neural Networks”, ArXiv preprint, ArXiv:1710.09282, 2017
[32] M. Liu, Y. He and H. Jiao, "Efficient Zero-Activation-Skipping for On-Chip Low-Energy CNN Acceleration," 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1-4, doi: 10.1109/AICAS51828.2021.9458578.
[33] Y. Chen, T. Krishna, J. Emer and V. Sze, “14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 262-263, 2016, doi: 10.1109/ISSCC.2016.7418007.
[34] B. Moons, R. Uytterhoeven, W. Dehaene and M. Verhelst, “14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 246-247, 2017, doi: 10.1109/ISSCC.2017.7870353.
[35] J. Yue, et al., “7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture,” IEEE International Solid- State Circuits Conference (ISSCC), pp. 138-140, 2019, doi: 10.1109/ISSCC.2019.8662360.
[36] C. Lin, et al., “7.1 A 3.4-to-13.3TOPS/W 3.6TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC,” IEEE International Solid- State Circuits Conference (ISSCC), pp. 134-136, 2020, doi: 10.1109/ISSCC19947.2020.9063111.
[37] Ching-Chun Pu (SoCLab, NCTU), “A 307 GOPs CNN Accelerator for 30 fps Object Detection SoC using a 28nm CMOS technology,” 2020.
[38] Ching-Wen Chen (SoCLab, NCTU), “An Area-Efficient SoC for 27.4 fps Object Detection CNN Accelerator with Data-Reuse Techniques in a 28nm CMOS Technology,” 2021
[39] H. Mo et al., "A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction," in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2021.3113569.
[40] J. -S. Park et al., "9.5 A 6K-MAC Feature-Map-Sparsity-Aware Neural Processing Unit in 5nm Flagship Mobile SoC," 2021 IEEE International Solid- State Circuits Conference (ISSCC), 2021, pp. 152-154, doi: 10.1109/ISSCC42613.2021.9365928.
[41] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems 25 (NeurIPS 2012)
[42] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510-4520
[43] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818-2826, doi: 10.1109/CVPR.2016.308.
[44] J. Sim, J. Park, M. Kim, D. Bae, Y. Choi and L. Kim, "14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems," 2016 IEEE International Solid-State Circuits Conference, pp. 264-265, 2016, doi: 10.1109/ISSCC.2016.7418008.
[45] Bankman, Daniel et al., "An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS," 2018 IEEE International Solid - State Circuits Conference, pp. 222-224, 2018, doi: 10.1109/ISSCC.2018.8310264.
[46] Zhang, Chen et al., "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on-Field-Programmable Gate
[47] “TSRI AI SoC Design Platform,” TSRI. https://www.tsri.org.tw/aisoc/aisoc.jsp.
[48] Jyun-Kai Jhan (SoCLab, NCTU), “The Zero-Phase Component Analysis Pre-Processor for Convolution Neural Networks,” 2019.
[49] Shu-Ping Liang (SoCLab, NCTU), “A Hardware Accelerator of Bounding Box Predictions and Non-Maximum Suppressions for the Tiny-YoloV2 Object Detections,” 2019.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊