臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.50) 您好！臺灣時間：2026/03/16 10:51

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

薛宇呈

研究生(外文):

Yu-Cheng Hsueh

論文名稱:

高效能頻寬使用之AI深度學習硬體加速架構設計與實現

論文名稱(外文):

High Efficient Bandwidth Utilization Hardware Design and Implement for AI Deep Learning Accelerator

指導教授:

吳崇賓

口試委員:

陳春僥、湯雲欽

口試日期:

2019-07-16

學位類別:

碩士

校院名稱:

國立中興大學

系所名稱:

電機工程學系所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2019

畢業學年度:

107

語文別:

中文

論文頁數:

中文關鍵詞:

卷積神經網路、加速器、深度學習

外文關鍵詞:

CNN、Accelerator、Deep learning

相關次數:

被引用:0
點閱:647
評分:
下載:51
書目收藏:0

本論文提出一個適用於Tiny-Yolo V2的神經網路加速器，並透過量化的策略，將輸入、輸出以及權重的資料格式轉換為uint8，以降低資料大小，達到減輕硬體資源的使用。在資料讀取上，考慮到Tiny-Yolo V2後幾層的feature map size，並無法有效使用頻寬，故本論文提出能提高bandwidth使用率的input feature maps(IF maps)資料擺放方式，並透過PE進行convolution運算以及requantize運算，得到神經網路的輸出。最後分別利用Vivado進行運算的驗證，與Design Compiler進行電路合成，得到最終的Tiny-Yolo V2的硬體加速器。

This paper proposes a neural network accelerator for Tiny-Yolo V2, and converts the data formats of input, output, and weight to uint8 through a quantization strategy to reduce the data size and make the hardware utilization more efficient. Since the size of input feature maps(IF maps) in Tiny-Yolo V2 backend layers is not 64bit-compatible, we propose an IF maps placement method that can improve the bandwidth utilization and arrange PE to calculate convolution and quantization to obtain the output of the neural network. Finally, Vivado environment is used to verify the hardware in FPGA, and the design is synthesized by the Design Compiler to obtain the final Tiny-Yolo V2 hardware accelerator.

摘要 i
Abstract ii
目次 iii
表目次 v
圖目次 vi
第1章緒論 1
1.1 研究動機 1
1.2 論文架構 2
第2章文獻探討 3
2.1 先前設計 3
2.2 CNN輕量化 6
2.3 Tiny-Yolo v2 8
第3章研究方法 10
3.1 Bandwidth分析 10
3.2 Architecture Overview 12
3.2.1 Data Placement in DRAM 13
3.2.2 SRAM 資料讀取與寫入 16
3.2.3 Data flow 23
第4章實驗結果與討論 31
4.1 驗證資料的取得 31
4.2 硬體加速器實現與設計比較 34
4.2.1 硬體加速器FPGA驗證 34
4.2.2 硬體加速器ASIC實現 35
第5章結論與未來工作 38
5.1 結論 38
5.2 未來工作 39
參考書目 40

[1]J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek, "Image Classification with the Fisher Vector: Theory and Practice," INRIA, 2013-05 2013. [Online]. Available: https://hal.inria.fr/hal-00779493
[2]Y. Lin et al., "Large-scale image classification: Fast feature extraction and SVM training," in CVPR 2011, 20-25 June 2011 2011, pp. 1689-1696, doi: 10.1109/CVPR.2011.5995477.
[3]L. Xie, T. Ahmad, L. Jin, Y. Liu, and S. Zhang, "A New CNN-Based Method for Multi-Directional Car License Plate Detection," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 2, pp. 507-517, 2018, doi: 10.1109/TITS.2017.2784093.
[4]C. Lin, Y. Lin, and W. Liu, "An efficient license plate recognition system using convolution neural networks," in 2018 IEEE International Conference on Applied System Invention (ICASI), 13-17 April 2018 2018, pp. 224-227, doi: 10.1109/ICASI.2018.8394573.
[5]J. Zhang, Y. Li, T. Li, L. Xun, and C. Shan, "License Plate Localization in Unconstrained Scenes Using a Two-Stage CNN-RNN," IEEE Sensors Journal, vol. 19, no. 13, pp. 5256-5265, 2019, doi: 10.1109/JSEN.2019.2900257.
[6]H. Li, R. Yang, and X. Chen, "License plate detection using convolutional neural network," in 2017 3rd IEEE International Conference on Computer and Communications (ICCC), 13-16 Dec. 2017 2017, pp. 1736-1740, doi: 10.1109/CompComm.2017.8322837.
[7]H. Jiang and E. Learned-Miller, "Face Detection with the Faster R-CNN," in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), 30 May-3 June 2017 2017, pp. 650-657, doi: 10.1109/FG.2017.82.
[8]S. Kang, J. Lee, K. Bong, C. Kim, Y. Kim, and H. Yoo, "Low-Power Scalable 3-D Face Frontalization Processor for CNN-Based Face Recognition in Mobile Devices," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 4, pp. 873-883, 2018, doi: 10.1109/JETCAS.2018.2845663.
[9]Y. Guo, z. j, J. Cai, B. Jiang, and J. Zheng, "CNN-Based Real-Time Dense Face Reconstruction with Inverse-Rendered Photo-Realistic Face Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 6, pp. 1294-1307, 2019, doi: 10.1109/TPAMI.2018.2837742.
[10]H. Shin et al., "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning," IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285-1298, 2016, doi: 10.1109/TMI.2016.2528162.
[11]S. Sadhukhan, G. K. Ghorai, S. Maiti, G. Sarkar, and A. K. Dhara, "Optic Disc Localization in Retinal Fundus Images using Faster R-CNN," in 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), 12-13 Jan. 2018 2018, pp. 1-4, doi: 10.1109/EAIT.2018.8470435.
[12]R. Ezhilarasi and P. Varalakshmi, "Tumor Detection in the Brain using Faster R-CNN," in 2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2018 2nd International Conference on, 30-31 Aug. 2018 2018, pp. 388-392, doi: 10.1109/I-SMAC.2018.8653705.
[13]A. Coates and A. Y. Ng, "Multi-camera object detection for robotics," in 2010 IEEE International Conference on Robotics and Automation, 3-7 May 2010 2010, pp. 412-419, doi: 10.1109/ROBOT.2010.5509644.
[14]L. Quan, D. Pei, B. Wang, and W. Ruan, "Research on Human Target Recognition Algorithm of Home Service Robot Based on Fast-RCNN," in 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA), 9-10 Oct. 2017 2017, pp. 369-373, doi: 10.1109/ICICTA.2017.88.
[15]S. Liu, B. Zheng, Y. Zhao, and B. Guo, "Game Robot's Vision Based on Faster RCNN," in 2018 Chinese Automation Congress (CAC), 30 Nov.-2 Dec. 2018 2018, pp. 2472-2476, doi: 10.1109/CAC.2018.8623107.
[16]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
[17]J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017 2017, pp. 6517-6525, doi: 10.1109/CVPR.2017.690.
[18]J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv preprint arXiv:1804.02767,
[19]R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Region-Based Convolutional Networks for Accurate Object Detection and Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142-158, 2016, doi: 10.1109/TPAMI.2015.2437384.
[20]K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN," in 2017 IEEE International Conference on Computer Vision (ICCV), 22-29 Oct. 2017 2017, pp. 2980-2988, doi: 10.1109/ICCV.2017.322.
[21]S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
[22]K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[23]C. Szegedy et al., "Going deeper with convolutions," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June 2015 2015, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.
[24]S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," presented at the Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, Lille, France, 2015.
[25]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016 2016, pp. 2818-2826, doi: 10.1109/CVPR.2016.308.
[26]C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[27]J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[28]Y. Lin and T. S. Chang, "Data and Hardware Efficient Design for Convolutional Neural Network," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 5, pp. 1642-1651, 2018, doi: 10.1109/TCSI.2017.2759803.
[29]Y. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, 2017, doi: 10.1109/JSSC.2016.2616357.
[30]L. Du et al., "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 1, pp. 198-208, 2018, doi: 10.1109/TCSI.2017.2735490.
[31]A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[32]H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, "Pruning filters for efficient convnets," arXiv preprint arXiv:1608.08710, 2016.
[33]M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1," arXiv preprint arXiv:1602.02830, 2016.
[34]F. Li, B. Zhang, and B. Liu, "Ternary weight networks," arXiv preprint arXiv:1605.04711, 2016.
[35]M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," in European Conference on Computer Vision, 2016: Springer, pp. 525-542.
[36]B. Jacob et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018 2018, pp. 2704-2713, doi: 10.1109/CVPR.2018.00286.
[37]Y. Du, L. Du, Y. Li, J. Su, and M.-C. F. Chang, "A streaming accelerator for deep convolutional neural networks with image and feature decomposition for resource-limited system applications," arXiv preprint arXiv:1709.05116, 2017.
[38]A. Bytyn, R. Leupers, and G. Ascheid, "An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration," in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019: IEEE, pp. 1-5.
[39]Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," in 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 13-17 Dec. 2014 2014, pp. 609-622, doi: 10.1109/MICRO.2014.58.
[40]J. Jo, S. Cha, D. Rho, and I. Park, "DSIP: A Scalable Inference Accelerator for Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 53, no. 2, pp. 605-618, 2018, doi: 10.1109/JSSC.2017.2764045.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	卷積神經網路在金融技術指標之應用
2.	用卷積神經網路辨識CT影像內肺部結核之研究
3.	應用雙向長短期記憶神經網路於新聞分類
4.	基於深度學習之天候影像分類
5.	基於卷積神經網路之非平衡式陶瓷基板瑕疵檢測模型
6.	基於深度學習類神經網路磁振造影自動化識別-以腮腺腫瘤為例
7.	結合語意關鍵詞與卷積神經網路之文本分類研究
8.	結合技術指標與卷積網路於股市交易之研究
9.	基於稀疏矩陣影像強化和深度學習之目標檢測技術
10.	卷積神經網路影像辨識系統架構設計
11.	卷積神經網路應用於中文字手寫風格辨識
12.	基於深度學習之靜態影像超解析度技術
13.	深度學習應用於雨水影像辨識進行車輛雨刷控制
14.	基於深度學習網路架構之物件偵測運算加速
15.	基於深度學習設計的可攜式行人偵測系統之研製

無相關期刊

1.	可重組之AI深度學習硬體加速架構設計與實現
2.	基於Cordic之Tiny Yolo V2物件區塊網路層的架構設計與實現
3.	高吞吐量深度學習AI加速器之硬體實現
4.	深度學習加速器硬體架構整合與實現
5.	高效率AI加速器之選擇性搜尋法硬體實現
6.	可重組之稀疏CNN網路深度學習加速架構設計
7.	基於異質層融合架構之深度學習加速器設計與實現
8.	適應性廣角影像變形校正與環景接合系統
9.	車牌辨識系統之FPGA硬體實現
10.	基於倒推法與機器學習之人性化撞球 AI 研發
11.	資本經濟後之AI科技文明的未來社會
12.	Medical AI 時代對法規的衝擊與調整-以個人資料保護法為中心
13.	中國跳棋對弈平台與AI的實作
14.	非奈奎斯特空間取樣超音波陣列雷達之投影匹配式入射角度估測技術
15.	加速深度學習之卷積神經網路的VLSI設計與實現

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室