跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.17) 您好!臺灣時間:2026/06/15 06:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蘇沃杰
研究生(外文):ACHARJEE, SUVAJIT
論文名稱:應用於二值化卷積神經網路之高效率硬體加速器
論文名稱(外文):Hardware Efficient Accelerator for Binary Convolution Neural Network Inference
指導教授:張添烜
指導教授(外文):Chang, Tian-Sheuan
口試日期:2019-05-31
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機資訊國際學程
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:76
中文關鍵詞:Machine learningFPGAConvolution Neural NetworkBinary Neural NetworkAccelerator
外文關鍵詞:Machine learningFPGAConvolution Neural NetworkBinary Neural NetworkAccelerator
相關次數:
  • 被引用被引用:0
  • 點閱點閱:356
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
二元神經網絡是近來這個時代的一個主題,它日益改進,以提高計算機視覺的使用,如識別,物體檢測,深度感知等。但是,大多數現有設計的硬體利用率低或過於複雜。導致電路的硬體成本過高。另外,BNN推斷中仍然存在大量的計算冗餘。因此,為了克服所有這些問題,如硬體利用率和計算複雜性問題,這種設計採用了收縮陣列架構,且採用二進制的輸入和權重。由於權重和激活可以存儲為單個比特,即+1存儲為1,並且-1存儲為0,因此大大減少了計算複雜性。此外,當通過逐位運算替換MAC操作時,解決了計算問題。在該設計中,使用8個PE,並且每個PE與每個累加器並行處理,其中在每個PE塊中使用了3x3內核大小的卷積。吞吐量增加,工作頻率最高可到188.67 MHz,最低為125 MHz。我們的結果顯示有8個PE,我們的設計能達到且也支持12.85 GOPS,與其他結果相比,面積效率提高了10倍。模擬RTL合成後的功耗為14mW。該架構使用Spartan 6系列FPGA在Xilinx ISE 14.7中成功實現。與其他最先進的工作相比,這種設計有更好的面積和帶寬效率。
Binary Neural Network is such a topic in this recent era that it is improving day by day to improve the use in computer vision such as recognition, object detection, depth perception etc. However, most of the existing designs suffer from low hardware utilization or complex circuits that result in high hardware cost. In addition, a large amount of computation redundancy still exists in BNN inference. Therefore, to overcome all these issues like hardware utilization and the problem of computational complexity, this design has adopted systolic array architecture, which takes binarized inputs and weights. It is drastically reduced since weights and activations can be stored as single bit i.e., +1 is stored as 1, and -1 is stored as 0. In addition, the problem of computational is solved when it replaced the MAC operations by bitwise operations. In this design, eight PEs are used and each PE is parallel processed with each accumulator where convolution 3x3 kernel size is filtered in each PE block. The throughput is increased and operating frequency at maximum of 188.67 MHz and minimum at 125 MHz. Our results shown with eight PEs , the design achieves and support 63.168 GOPS, which is 10x more area efficient with other results. The power consumption after simulating the RTL synthesis is 0.014W. The architecture implemented successfully in Xilinx ISE 14.7 using the Spartan 6 series FPGA. This design also shows better area and bandwidth efficiency compared to the other state-of-the-art works.
Table of Contents V
List of Figures VIII
List of Tables XI
Chapter 1. Introduction 1
1.1. Motivation 1
1.2. Thesis organization 3
Chapter 2. Overview of Binary Convolutional Neural Network 4
2.1. Overview of CNN algorithm 4
2.1.1. Convolutional layer 5
2.1.2. Pooling layer 6
2.1.3. Classification-Fully connected layer 7
2.1.4. Activation function 8
2.2. Popular networks 10
2.2.1. LeNet 10
2.2.2. AlexNet 11
2.2.3. ZFNet 12
2.2.4. VGGNet 12
2.2.5. GoogLeNet 13
2.2.6. ResNet 14
2.2.7. Summary 15
2.3. Challenges in CNN implementation 18
2.4. Related work 22
2.4.1. Background and Motivation 22
2.4.2. FPGA designs 27
Chapter 3. Hardware Design and Implementation 33
3.1. System architecture & data flow 34
3.1.1. Challenges of the architecture design 38
3.1.2. Processing Engine 39
3.1.3. XNOR operations 44
3.1.4. Accumulator Design 46
3.1.5. Max Pooling Design 47
3.1.6. Operations and waveform discussions 48
3.1.7. Placement, floor planning 63
3.1.8. Plan Ahead 64
Chapter 4. Results and Comparison 65
4.1. Result Evaluation 65
4.1.1. Prototype Implementation 65
4.1.2. Energy Efficiency 65
4.1.3. Frequency Analysis 66
4.1.4. Power Analysis 66
4.1.5. Comparison with prior works 67
Here the comparison is made with the prior works[18] where it can be said that the efficiency is 2x times better. It may not be fair enough to compare very efficiently as we are using different input size bits. Our design input size is 16x16 bits. 68
4.1.6. Layer by layer computational cost in VGG-16 68
4.1.7. Layer by layer computational cost in AlexNet 69
4.1.8. FPGA resource utilization summary 70
Chapter 5. Conclusion and Future Work 71
5.1. Conclusion 71
5.2. Future work 73
References 74
[1] D. Tomè, F. Monti, L. Baroffio, L. Bondi, M. Tagliasacchi, and S. Tubaro, “Deep convolutional neural networks for pedestrian detection,” 2015.
[2] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks,” 2013.
[3] R. M. Krauss and A. V. Nichols, “Metabolic Interrelationships of HDL Subclasses,” Lipoprotein Defic. Syndr., pp. 17–27, 2012.
[4] T. He, W. Huang, Y. Qiao, and J. Yao, “Text-Attentional Convolutional Neural Network for Scene Text Detection,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2529–2541, 2016.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Dl-物体検出01-2_2014_R-Cnn(Cvpr),” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014.
[6] G. A. Hembury, V. V. Borovkov, J. M. Lintuluoto, and Y. Inoue, “Deep Residual Learning for Image Recognition Kaiming,” Chem. Lett., vol. 32, no. 5, pp. 428–429, 2003.
[7] S. Ji, M. Yang, K. Yu, and W. Xu, “3D convolutional neural networks for human action recognition,” ICML, Int. Conf. Mach. Learn., vol. 35, no. 1, pp. 221–31, 2010.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “2012 AlexNet,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2012.
[9] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” pp. 1–14, 2014.
[10] G. Smoluk, “Google net,” Mod. Plast., vol. 57, no. 3, pp. 62–63, 1980.
[11] H. Zhou, W. Ouyang, J. Cheng, X. Wang, and H. Li, “Deep Continuous Conditional Random Fields with Asymmetric Inter-object Constraints for Online Multi-object Tracking,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–12, 2018.
[12] Q. Zhang, “Convolutional Neural Networks,” 3rd Int. Conf. Electromechanical Control Technol. Transp., 2018.
[13] F. Nagata et al., “Design Tool of Deep Convolutional Neural Network for Intelligent Visual Inspection,” IOP Conf. Ser. Mater. Sci. Eng., vol. 423, p. 012073, 2018.
[14] G. Cao, S. Ruan, Y. Peng, S. Huang, and N. Kwok, “Large-Complex-Surface Defect Detection by Hybrid Gradient Threshold Segmentation and Image Registration,” IEEE Access, vol. 6, no. May, pp. 36235–36246, 2018.
[15] W. Yang, “yangOLWcvpr16.”
[16] T. Pfister, J. Charles, and A. Zisserman, “2015_ICCV_Flowing ConvNets for Human Pose Estimation in Videos.pdf.”
[17] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[18] C. Fu, S. Zhu, H. Su, C.-E. Lee, and J. Zhao, “Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA,” no. 2, 2018.
[19] J. Qiu et al., “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network,” pp. 26–35, 2016.
[20] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based Accelerator DZhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015). Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Progr,” Proc. 2015 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays - FPGA ’15, pp. 161–170, 2015.
[21] V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 696–701, 2014.
[22] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for convolutional neural networks,” 2013 IEEE 31st Int. Conf. Comput. Des. ICCD 2013, no. 2013, pp. 13–19, 2013.
[23] Y. Bengio, “Deep Learning tutorial 0.1,” Nips, 2015.
[24] M. D. Z. and R. Fergus, “[Occlusion] Visualizing and Understanding Convolutional Networks,” Anal. Chem. Res., no. ICLR, pp. 818–833, 2018.
[25] Y. J. Lin and T. S. Chang, “Data and Hardware Efficient Design for Convolutional Neural Network,” IEEE Trans. Circuits Syst. I Regul. Pap., vol. 65, no. 5, pp. 1642–1651, 2018.
[26] X. Glorot, A. Bordes, and Y. Bengio, “2011glorot_DeepSparseRectifierNeuralNetworks.pdf,” vol. 15, pp. 315–323, 2011.
[27] R. Wiest, A. Krag, and A. Gerbes, “Spontaneous bacterial peritonitis: Recent guidelines and beyond,” Gut, vol. 61, no. 2, pp. 297–310, 2012.
[28] “A B7CEDGF HIB7PRQTSUDGQICWVYX HIB edCdSISIXvg5r ` CdQTw XvefCdS,” proc. IEEE, 1998.
[29] F.-F. Li, J. Johnson, and S. Yeung, “CNN architecture comparison,” 2017.
[30] A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU),” no. 1, pp. 2–8, 2018.
[31] C. Poulet, J. Y. Han, and Y. Lecun, “CNP : An FPGA-based Processor for Convolutional Networks,” Pattern Recognit., vol. 26, no. 11, pp. 2004–2005, 2007.
[32] Y. Li et al., “IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS A 34-FPS 698-GOP / s / W Binarized Deep Neural Network-based Natural Scene Text Interpretation Accelerator for Mobile Edge,” IEEE Trans. Ind. Electron., vol. PP, no. c, p. 1, 2018.
[33] C. Farabet, Y. Lecun, and E. Culurciello, “NeuFlow: A Runtime Reconfigurable Dataflow Architecture for Vision,” Cvpr, pp. 2–4, 2011.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊