跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.50) 您好!臺灣時間:2026/03/16 10:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:薛宇呈
研究生(外文):Yu-Cheng Hsueh
論文名稱:高效能頻寬使用之AI深度學習硬體加速架構設計與實現
論文名稱(外文):High Efficient Bandwidth Utilization Hardware Design and Implement for AI Deep Learning Accelerator
指導教授:吳崇賓
口試委員:陳春僥湯雲欽
口試日期:2019-07-16
學位類別:碩士
校院名稱:國立中興大學
系所名稱:電機工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:45
中文關鍵詞:卷積神經網路加速器深度學習
外文關鍵詞:CNNAcceleratorDeep learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:647
  • 評分評分:
  • 下載下載:51
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出一個適用於Tiny-Yolo V2的神經網路加速器,並透過量化的策略,將輸入、輸出以及權重的資料格式轉換為uint8,以降低資料大小,達到減輕硬體資源的使用。在資料讀取上,考慮到Tiny-Yolo V2後幾層的feature map size,並無法有效使用頻寬,故本論文提出能提高bandwidth使用率的input feature maps(IF maps)資料擺放方式,並透過PE進行convolution運算以及requantize運算,得到神經網路的輸出。最後分別利用Vivado進行運算的驗證,與Design Compiler進行電路合成,得到最終的Tiny-Yolo V2的硬體加速器。
This paper proposes a neural network accelerator for Tiny-Yolo V2, and converts the data formats of input, output, and weight to uint8 through a quantization strategy to reduce the data size and make the hardware utilization more efficient. Since the size of input feature maps(IF maps) in Tiny-Yolo V2 backend layers is not 64bit-compatible, we propose an IF maps placement method that can improve the bandwidth utilization and arrange PE to calculate convolution and quantization to obtain the output of the neural network. Finally, Vivado environment is used to verify the hardware in FPGA, and the design is synthesized by the Design Compiler to obtain the final Tiny-Yolo V2 hardware accelerator.
摘要 i
Abstract ii
目次 iii
表目次 v
圖目次 vi
第1章 緒論 1
1.1 研究動機 1
1.2 論文架構 2
第2章 文獻探討 3
2.1 先前設計 3
2.2 CNN輕量化 6
2.3 Tiny-Yolo v2 8
第3章 研究方法 10
3.1 Bandwidth分析 10
3.2 Architecture Overview 12
3.2.1 Data Placement in DRAM 13
3.2.2 SRAM 資料讀取與寫入 16
3.2.3 Data flow 23
第4章 實驗結果與討論 31
4.1 驗證資料的取得 31
4.2 硬體加速器實現與設計比較 34
4.2.1 硬體加速器FPGA驗證 34
4.2.2 硬體加速器ASIC實現 35
第5章 結論與未來工作 38
5.1 結論 38
5.2 未來工作 39
參考書目 40
[1]J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek, "Image Classification with the Fisher Vector: Theory and Practice," INRIA, 2013-05 2013. [Online]. Available: https://hal.inria.fr/hal-00779493
[2]Y. Lin et al., "Large-scale image classification: Fast feature extraction and SVM training," in CVPR 2011, 20-25 June 2011 2011, pp. 1689-1696, doi: 10.1109/CVPR.2011.5995477.
[3]L. Xie, T. Ahmad, L. Jin, Y. Liu, and S. Zhang, "A New CNN-Based Method for Multi-Directional Car License Plate Detection," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 2, pp. 507-517, 2018, doi: 10.1109/TITS.2017.2784093.
[4]C. Lin, Y. Lin, and W. Liu, "An efficient license plate recognition system using convolution neural networks," in 2018 IEEE International Conference on Applied System Invention (ICASI), 13-17 April 2018 2018, pp. 224-227, doi: 10.1109/ICASI.2018.8394573.
[5]J. Zhang, Y. Li, T. Li, L. Xun, and C. Shan, "License Plate Localization in Unconstrained Scenes Using a Two-Stage CNN-RNN," IEEE Sensors Journal, vol. 19, no. 13, pp. 5256-5265, 2019, doi: 10.1109/JSEN.2019.2900257.
[6]H. Li, R. Yang, and X. Chen, "License plate detection using convolutional neural network," in 2017 3rd IEEE International Conference on Computer and Communications (ICCC), 13-16 Dec. 2017 2017, pp. 1736-1740, doi: 10.1109/CompComm.2017.8322837.
[7]H. Jiang and E. Learned-Miller, "Face Detection with the Faster R-CNN," in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 30 May-3 June 2017 2017, pp. 650-657, doi: 10.1109/FG.2017.82.
[8]S. Kang, J. Lee, K. Bong, C. Kim, Y. Kim, and H. Yoo, "Low-Power Scalable 3-D Face Frontalization Processor for CNN-Based Face Recognition in Mobile Devices," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 4, pp. 873-883, 2018, doi: 10.1109/JETCAS.2018.2845663.
[9]Y. Guo, z. j, J. Cai, B. Jiang, and J. Zheng, "CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 6, pp. 1294-1307, 2019, doi: 10.1109/TPAMI.2018.2837742.
[10]H. Shin et al., "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning," IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285-1298, 2016, doi: 10.1109/TMI.2016.2528162.
[11]S. Sadhukhan, G. K. Ghorai, S. Maiti, G. Sarkar, and A. K. Dhara, "Optic Disc Localization in Retinal Fundus Images using Faster R-CNN," in 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), 12-13 Jan. 2018 2018, pp. 1-4, doi: 10.1109/EAIT.2018.8470435.
[12]R. Ezhilarasi and P. Varalakshmi, "Tumor Detection in the Brain using Faster R-CNN," in 2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2018 2nd International Conference on, 30-31 Aug. 2018 2018, pp. 388-392, doi: 10.1109/I-SMAC.2018.8653705.
[13]A. Coates and A. Y. Ng, "Multi-camera object detection for robotics," in 2010 IEEE International Conference on Robotics and Automation, 3-7 May 2010 2010, pp. 412-419, doi: 10.1109/ROBOT.2010.5509644.
[14]L. Quan, D. Pei, B. Wang, and W. Ruan, "Research on Human Target Recognition Algorithm of Home Service Robot Based on Fast-RCNN," in 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA), 9-10 Oct. 2017 2017, pp. 369-373, doi: 10.1109/ICICTA.2017.88.
[15]S. Liu, B. Zheng, Y. Zhao, and B. Guo, "Game Robot's Vision Based on Faster RCNN," in 2018 Chinese Automation Congress (CAC), 30 Nov.-2 Dec. 2018 2018, pp. 2472-2476, doi: 10.1109/CAC.2018.8623107.
[16]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
[17]J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690.
[18]J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767,
[19]R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142-158, 2016, doi: 10.1109/TPAMI.2015.2437384.
[20]K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN," in 2017 IEEE International Conference on Computer Vision (ICCV), 22-29 Oct. 2017 2017, pp. 2980-2988, doi: 10.1109/ICCV.2017.322.
[21]S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
[22]K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[23]C. Szegedy et al., "Going deeper with convolutions," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June 2015 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.
[24]S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," presented at the Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, Lille, France, 2015.
[25]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 2818-2826, doi: 10.1109/CVPR.2016.308.
[26]C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[27]J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[28]Y. Lin and T. S. Chang, "Data and Hardware Efficient Design for Convolutional Neural Network," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 5, pp. 1642-1651, 2018, doi: 10.1109/TCSI.2017.2759803.
[29]Y. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, 2017, doi: 10.1109/JSSC.2016.2616357.
[30]L. Du et al., "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 1, pp. 198-208, 2018, doi: 10.1109/TCSI.2017.2735490.
[31]A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[32]H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, "Pruning filters for efficient convnets," arXiv preprint arXiv:1608.08710, 2016.
[33]M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1," arXiv preprint arXiv:1602.02830, 2016.
[34]F. Li, B. Zhang, and B. Liu, "Ternary weight networks," arXiv preprint arXiv:1605.04711, 2016.
[35]M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," in European Conference on Computer Vision, 2016: Springer, pp. 525-542.
[36]B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018 2018, pp. 2704-2713, doi: 10.1109/CVPR.2018.00286.
[37]Y. Du, L. Du, Y. Li, J. Su, and M.-C. F. Chang, "A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications," arXiv preprint arXiv:1709.05116, 2017.
[38]A. Bytyn, R. Leupers, and G. Ascheid, "An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration," in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019: IEEE, pp. 1-5.
[39]Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 13-17 Dec. 2014 2014, pp. 609-622, doi: 10.1109/MICRO.2014.58.
[40]J. Jo, S. Cha, D. Rho, and I. Park, "DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 53, no. 2, pp. 605-618, 2018, doi: 10.1109/JSSC.2017.2764045.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊