臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.217.127) 您好！臺灣時間：2026/07/31 00:53

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

蘇沃杰

研究生(外文):

ACHARJEE, SUVAJIT

論文名稱:

應用於二值化卷積神經網路之高效率硬體加速器

論文名稱(外文):

Hardware Efficient Accelerator for Binary Convolution Neural Network Inference

指導教授:

張添烜

指導教授(外文):

Chang, Tian-Sheuan

口試日期:

2019-05-31

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電機資訊國際學程

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2019

畢業學年度:

107

語文別:

英文

論文頁數:

中文關鍵詞:

Machine learning、FPGA、Convolution Neural Network、Binary Neural Network、Accelerator

外文關鍵詞:

Machine learning、FPGA、Convolution Neural Network、Binary Neural Network、Accelerator

相關次數:

被引用:0
點閱:359
評分:
下載:6
書目收藏:0

二元神經網絡是近來這個時代的一個主題，它日益改進，以提高計算機視覺的使用，如識別，物體檢測，深度感知等。但是，大多數現有設計的硬體利用率低或過於複雜。導致電路的硬體成本過高。另外，BNN推斷中仍然存在大量的計算冗餘。因此，為了克服所有這些問題，如硬體利用率和計算複雜性問題，這種設計採用了收縮陣列架構，且採用二進制的輸入和權重。由於權重和激活可以存儲為單個比特，即+1存儲為1，並且-1存儲為0，因此大大減少了計算複雜性。此外，當通過逐位運算替換MAC操作時，解決了計算問題。在該設計中，使用8個PE，並且每個PE與每個累加器並行處理，其中在每個PE塊中使用了3x3內核大小的卷積。吞吐量增加，工作頻率最高可到188.67 MHz，最低為125 MHz。我們的結果顯示有8個PE，我們的設計能達到且也支持12.85 GOPS，與其他結果相比，面積效率提高了10倍。模擬RTL合成後的功耗為14mW。該架構使用Spartan 6系列FPGA在Xilinx ISE 14.7中成功實現。與其他最先進的工作相比，這種設計有更好的面積和帶寬效率。

Binary Neural Network is such a topic in this recent era that it is improving day by day to improve the use in computer vision such as recognition, object detection, depth perception etc. However, most of the existing designs suffer from low hardware utilization or complex circuits that result in high hardware cost. In addition, a large amount of computation redundancy still exists in BNN inference. Therefore, to overcome all these issues like hardware utilization and the problem of computational complexity, this design has adopted systolic array architecture, which takes binarized inputs and weights. It is drastically reduced since weights and activations can be stored as single bit i.e., +1 is stored as 1, and -1 is stored as 0. In addition, the problem of computational is solved when it replaced the MAC operations by bitwise operations. In this design, eight PEs are used and each PE is parallel processed with each accumulator where convolution 3x3 kernel size is filtered in each PE block. The throughput is increased and operating frequency at maximum of 188.67 MHz and minimum at 125 MHz. Our results shown with eight PEs , the design achieves and support 63.168 GOPS, which is 10x more area efficient with other results. The power consumption after simulating the RTL synthesis is 0.014W. The architecture implemented successfully in Xilinx ISE 14.7 using the Spartan 6 series FPGA. This design also shows better area and bandwidth efficiency compared to the other state-of-the-art works.

Table of Contents V
List of Figures VIII
List of Tables XI
Chapter 1. Introduction 1
1.1. Motivation 1
1.2. Thesis organization 3
Chapter 2. Overview of Binary Convolutional Neural Network 4
2.1. Overview of CNN algorithm 4
2.1.1. Convolutional layer 5
2.1.2. Pooling layer 6
2.1.3. Classification-Fully connected layer 7
2.1.4. Activation function 8
2.2. Popular networks 10
2.2.1. LeNet 10
2.2.2. AlexNet 11
2.2.3. ZFNet 12
2.2.4. VGGNet 12
2.2.5. GoogLeNet 13
2.2.6. ResNet 14
2.2.7. Summary 15
2.3. Challenges in CNN implementation 18
2.4. Related work 22
2.4.1. Background and Motivation 22
2.4.2. FPGA designs 27
Chapter 3. Hardware Design and Implementation 33
3.1. System architecture & data flow 34
3.1.1. Challenges of the architecture design 38
3.1.2. Processing Engine 39
3.1.3. XNOR operations 44
3.1.4. Accumulator Design 46
3.1.5. Max Pooling Design 47
3.1.6. Operations and waveform discussions 48
3.1.7. Placement, floor planning 63
3.1.8. Plan Ahead 64
Chapter 4. Results and Comparison 65
4.1. Result Evaluation 65
4.1.1. Prototype Implementation 65
4.1.2. Energy Efficiency 65
4.1.3. Frequency Analysis 66
4.1.4. Power Analysis 66
4.1.5. Comparison with prior works 67
Here the comparison is made with the prior works[18] where it can be said that the efficiency is 2x times better. It may not be fair enough to compare very efficiently as we are using different input size bits. Our design input size is 16x16 bits. 68
4.1.6. Layer by layer computational cost in VGG-16 68
4.1.7. Layer by layer computational cost in AlexNet 69
4.1.8. FPGA resource utilization summary 70
Chapter 5. Conclusion and Future Work 71
5.1. Conclusion 71
5.2. Future work 73
References 74

[1] D. Tomè, F. Monti, L. Baroffio, L. Bondi, M. Tagliasacchi, and S. Tubaro, “Deep convolutional neural networks for pedestrian detection,” 2015.
[2] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks,” 2013.
[3] R. M. Krauss and A. V. Nichols, “Metabolic Interrelationships of HDL Subclasses,” Lipoprotein Defic. Syndr., pp. 17–27, 2012.
[4] T. He, W. Huang, Y. Qiao, and J. Yao, “Text-Attentional Convolutional Neural Network for Scene Text Detection,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2529–2541, 2016.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Dl-物体検出01-2_2014_R-Cnn(Cvpr),” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014.
[6] G. A. Hembury, V. V. Borovkov, J. M. Lintuluoto, and Y. Inoue, “Deep Residual Learning for Image Recognition Kaiming,” Chem. Lett., vol. 32, no. 5, pp. 428–429, 2003.
[7] S. Ji, M. Yang, K. Yu, and W. Xu, “3D convolutional neural networks for human action recognition,” ICML, Int. Conf. Mach. Learn., vol. 35, no. 1, pp. 221–31, 2010.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “2012 AlexNet,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2012.
[9] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” pp. 1–14, 2014.
[10] G. Smoluk, “Google net,” Mod. Plast., vol. 57, no. 3, pp. 62–63, 1980.
[11] H. Zhou, W. Ouyang, J. Cheng, X. Wang, and H. Li, “Deep Continuous Conditional Random Fields with Asymmetric Inter-object Constraints for Online Multi-object Tracking,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–12, 2018.
[12] Q. Zhang, “Convolutional Neural Networks,” 3rd Int. Conf. Electromechanical Control Technol. Transp., 2018.
[13] F. Nagata et al., “Design Tool of Deep Convolutional Neural Network for Intelligent Visual Inspection,” IOP Conf. Ser. Mater. Sci. Eng., vol. 423, p. 012073, 2018.
[14] G. Cao, S. Ruan, Y. Peng, S. Huang, and N. Kwok, “Large-Complex-Surface Defect Detection by Hybrid Gradient Threshold Segmentation and Image Registration,” IEEE Access, vol. 6, no. May, pp. 36235–36246, 2018.
[15] W. Yang, “yangOLWcvpr16.”
[16] T. Pfister, J. Charles, and A. Zisserman, “2015_ICCV_Flowing ConvNets for Human Pose Estimation in Videos.pdf.”
[17] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[18] C. Fu, S. Zhu, H. Su, C.-E. Lee, and J. Zhao, “Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA,” no. 2, 2018.
[19] J. Qiu et al., “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network,” pp. 26–35, 2016.
[20] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based Accelerator DZhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015). Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Progr,” Proc. 2015 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays - FPGA ’15, pp. 161–170, 2015.
[21] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 696–701, 2014.
[22] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for convolutional neural networks,” 2013 IEEE 31st Int. Conf. Comput. Des. ICCD 2013, no. 2013, pp. 13–19, 2013.
[23] Y. Bengio, “Deep Learning tutorial 0.1,” Nips, 2015.
[24] M. D. Z. and R. Fergus, “[Occlusion] Visualizing and Understanding Convolutional Networks,” Anal. Chem. Res., no. ICLR, pp. 818–833, 2018.
[25] Y. J. Lin and T. S. Chang, “Data and Hardware Efficient Design for Convolutional Neural Network,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 65, no. 5, pp. 1642–1651, 2018.
[26] X. Glorot, A. Bordes, and Y. Bengio, “2011glorot_DeepSparseRectifierNeuralNetworks.pdf,” vol. 15, pp. 315–323, 2011.
[27] R. Wiest, A. Krag, and A. Gerbes, “Spontaneous bacterial peritonitis: Recent guidelines and beyond,” Gut, vol. 61, no. 2, pp. 297–310, 2012.
[28] “A B7CEDGF HIB7PRQTSUDGQICWVYX HIB edCdSISIXvg5r ` CdQTw XvefCdS,” proc. IEEE, 1998.
[29] F.-F. Li, J. Johnson, and S. Yeung, “CNN architecture comparison,” 2017.
[30] A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU),” no. 1, pp. 2–8, 2018.
[31] C. Poulet, J. Y. Han, and Y. Lecun, “CNP : An FPGA-based Processor for Convolutional Networks,” Pattern Recognit., vol. 26, no. 11, pp. 2004–2005, 2007.
[32] Y. Li et al., “IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS A 34-FPS 698-GOP / s / W Binarized Deep Neural Network-based Natural Scene Text Interpretation Accelerator for Mobile Edge,” IEEE Trans. Ind. Electron., vol. PP, no. c, p. 1, 2018.
[33] C. Farabet, Y. Lecun, and E. Culurciello, “NeuFlow: A Runtime Reconfigurable Dataflow Architecture for Vision,” Cvpr, pp. 2–4, 2011.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	植基於深度學習之影像辨識技術之研究
2.	以卷積神經網路為基礎之軸承故障診斷系統
3.	基於稀疏行式核心之卷積神經網路加速器設計
4.	依據全局繞線資訊預測細部繞線後設計規則違反之卷積神經網路
5.	使用快速傅立葉轉換之高效節能卷積神經網路加速器
6.	基於二位元權重及激活函數之高效率卷積類神經網路設計與實現
7.	實現於純FPGA之基於HOG之即時行人檢測系統
8.	LSTM法則應用於連續手勢辨識之研究──訓練系統軟體及辨識系統FPGA之實作
9.	以BNN與AlexNet為基礎適用於CIFAR10圖形辨識之積體電路架構設計
10.	基於壓縮式卷積神經網路之推論加速器設計
11.	在卷積神經網絡加速器執行逐點卷積所設計之稀疏優化架構
12.	利用深度學習預測遠距離染色質間交互作用
13.	運用Tensorflow的架構進行惡意軟體分類研究
14.	適用於行動裝置的深度學習影像辨識應用
15.	深度神經網路於現場可程式化邏輯閘陣列之高效實作與轉換方法

無相關期刊

1.	動態網路精簡之高效執行研究
2.	適用即時物件偵測之記憶體內運算加速器
3.	高精確度電阻式記憶體二元類神經網路訓練
4.	適用於Demura表的近無失真高壓縮方法
5.	用於光達點雲之即時尺度感知分割
6.	計算資源可適性網路設計
7.	利用加速度計與全卷積類神經網路之步態參數萃取
8.	應用於深度卷積神經網路之高效能可重組加速器設計
9.	適用於神經網路加速器之低頻寬模型學習
10.	運用於關鍵字偵測與物件偵測之近記憶體與記憶體內運算設計
11.	稀疏三元卷積類神經網路模型及其硬體設計
12.	立體視覺感知增強研究
13.	適用記憶體運算模型設計與最佳化研究
14.	六倍顯示壓縮
15.	具率失真最佳化之高效移動估測設計

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室