跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.109) 您好!臺灣時間:2026/04/20 02:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:沈恩禾
研究生(外文):En-Ho Shen
論文名稱:低數值精確度捲積神經網路加速器之可重置化超大型積體電路設計
論文名稱(外文):Reconfigurable Low Arithmetic Precision Convolution Neural Network Accelerator VLSI Design and Implementation
指導教授:簡韶逸
指導教授(外文):Shao-Yi Chien
口試委員:蔡宗漢吳安宇楊家驤
口試日期:2019-07-30
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電子工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:58
中文關鍵詞:低數值精確度捲積神經網路加速器可重置化超大型積體電路設計
DOI:10.6342/NTU201902618
相關次數:
  • 被引用被引用:0
  • 點閱點閱:229
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來神經網路(DNN)在各項人工智慧應用獲得廣大成功與進步。但是通常這樣的模型需要笨重且耗電的通用顯示卡(GPGPU)幫助運算,不適合在行動裝置等使用電池的設備使用。在這篇論文,我們提出一個超大型積體電路設計,專門運算經過量化(quantization)的低運算精確度捲積神經網路(CNN),大大減少跨系統資料傳輸造成的能量消耗,特別適合行動裝置上的神經網路加速。我們首先提出一個簡單且有效的神經網路量化演算法,一個有著高度資料重複利用率且適合這樣經過量化神經網路的資料流動策略。為了發揮量化資料的最大潛力,我們設計了一個專門運算低運算精確度資料的乘法加法樹結構,接著提出了一個晶片內緩存記憶體結構與資料重新排列方法,用以減少任何不必要的資料存取浪費和記憶體分塊衝突(bank-conflict),最後是一組接受從緩存記憶體廣播(broadcast)到各個運算單位的資料,並且將完成的結果依序傳回全域緩存記憶體(global buffer)的核心運算單元陣列。我們提出的架構能支援絕大部分的捲積神經網路構造,並且能夠重置(reconfigure)適當的運算資料精確度,適應各種量化的神經網路結構。最後的設計使用了 180KB 的晶片內記憶體,和 1340K 的邏輯閘
Deep neural networks (DNNs) shows promising results on various AI application tasks. However such networks typically are executed on general purpose GPUs with bulky size in form factor and hundreds of watt in power consumption, which unsuitable for mobile applications. In this thesis, we present a VLSI architecture able to process on quantized low numeric-precision convolution neural networks (CNNs), cutting down on power consumption from memory access and speeding the model up with limited area budget,particularlyfitformobiledevices.We first propose a quantization re-trainig algorithm for trainig low-precision CNN, then a dataflow with high data reuse rate with a specially data multiplication accumulation strategy specially designed for such quantized model. To fully utilize the efficiency of computation with such low-precision data, we design a micro-architecture for low bit-length multiplication and accumulation, then a on-chip memory hierarchy and data re-alignment flow for power saving and avoiding buffer bank-conflicts, and a PE array designed for taking broadcast-ed data from buffer and sending out finished data sequentially back to buffer for such dataflow. The architecture is highly flexible for various CNN shaped and re-configurable for low bit-length quantized models. The design synthesised with a 180KB on-chip memory capacity and a 1340k logic gate counts area, the implementation resultshows state-of-the-art hardware efficiency.
Abstract i
List of Figures v
List of Tables ix
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 fixed point quantisation . . . . . . . . . . . . . . . . . . . 6
2.1.2 ternary to binary quantisation . . . . . . . . . . . . . . . 6
2.1.3 8-bit quantization on modern models . . . . . . . . . . . 8
2.2 Hardware design . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Dataflow optimization: row stationary . . . . . . . . . . . 10
2.2.2 Precision reconfigurable and sub-word parallelism arithmetic unit . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Bit-level re-configurable arithmetic unit . . . . . . . . . . 12
3 Low numeric precision convolution neural network 15
3.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 15
3.2 Low Precision CNN . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Quantization Loss Minimization Threshold Selection . . . . . . . 19
3.4 Computational consideration and data re-packing . . . . . . . . . 21
3.4.1 Data re-packing . . . . . . . . . . . . . . . . . . . . . . . 21
4 ProposedArchitecture 25 4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Output row stationary dataflow . . . . . . . . . . . . . . . 27
4.1.2 Data tiling . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.3 Data re-alignment and buffer hierarchy . . . . . . . . . . 30
4.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 PE processing pipeline . . . . . . . . . . . . . . . . . . . 32
4.2.2 Sub-word accumulation operation and re-configurable arithmetic logic unit . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.3 Shift dispatcher . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.4 Quantization . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Results 43
5.1 Quantization error minimization training . . . . . . . . . . . . . . 43
5.2 Implementation results . . . . . . . . . . . . . . . . . . . . . . . 44
5.2.1 Area and power . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Conclusion 53
Reference 55
[1] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “Lognet: Energy-efficient neural networks using logarithmic computation,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp. 5900–5904. v, 6
[2] S.Han,H.Mao,andW.Dally,“Deepcompression: Compressingdeepneural networks with pruning, trained quantization and huffman coding,” 10 2016. v, 2, 6
[3] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” arXiv e-prints, p. arXiv:1603.05279, Mar 2016. v, 3, 6, 8, 18, 37, 44
[4] S. Migacz, “8-bit inference with tensorrt.” [Online]. Available: http://on-demand.gputechconf.com/gtc/2017/presentation/ s7310-8-bit-inference-with-tensorrt.pdf v, 8, 9, 44
[5] Y. Chen, T. Krishna, J. Emer, and V. Sze, “14.5 eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC), Jan 2016, pp. 262–263. [Online]. Available: http://people.csail.mit.edu/emer/slides/2016. 02.isscc.eyeriss.slides.pdf v, 10
[6] B. Moons and M. Verhelst, “A 0.3-2.6 TOPS/W precision-scalable processor for real-time large-scale convnets,” CoRR, vol. abs/1606.05094, 2016. [Online]. Available: http://arxiv.org/abs/1606.05094 v, 11, 26, 45
[7] B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequencyscalable convolutional neural network processor in 28nm fdsoi,” in 2017 IEEE International Solid-State Circuits Conference (ISSCC), Feb 2017, pp. 246–247. v, 11, 26, 45
[8] H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. K. Kim, V. Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks,” CoRR, vol. abs/1712.01507, 2017. [Online]. Available: http://arxiv.org/abs/1712.01507 v, 12
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv e-prints, p. arXiv:1512.03385, Dec 2015. 1
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/ 4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf 1, 3
[11] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE:EfficientInferenceEngineonCompressedDeepNeuralNetwork,”arXiv e-prints, p. arXiv:1602.01528, Feb 2016. 1, 2, 25
[12] Y. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energyefficient dataflow for convolutional neural networks,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), June 2016, pp. 367–379. 2, 10, 12, 25, 26, 45
[13] J. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural network compression,” CoRR, vol. abs/1707.06342, 2017. [Online]. Available: http://arxiv.org/abs/1707.06342 2
[14] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017. [Online]. Available: http://arxiv.org/abs/1704.04861 2
[15] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” CoRR, vol. abs/1707.01083, 2017. [Online]. Available: http://arxiv.org/abs/1707.01083 2
[16] S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep convolutional neural networks for object recognition,” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1131– 1135, 2015. 3, 6
[17] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li, “Imagenet large scale visual recognition challenge,” CoRR, vol. abs/1409.0575, 2014. [Online]. Available: http://arxiv.org/abs/1409.0575 3
[18] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/ 6
[19] F. Li and B. Liu, “Ternary weight networks,” CoRR, vol. abs/1605.04711, 2016. [Online]. Available: http://arxiv.org/abs/1605.04711 6
[20] M. Courbariaux and Y. Bengio, “Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1,” CoRR, vol. abs/1602.02830, 2016. [Online]. Available: http://arxiv.org/abs/1602.02830 6
[21] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).” [Online]. Available: http://www.cs.toronto.edu/∼kriz/ cifar.html 6
[22] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” CoRR, vol. abs/1502.03167, 2015. [Online]. Available: http://arxiv.org/abs/1502.03167 19
[23] johnjohnlin, “Nicotb, a python-verilog co-simulation framework.” [Online]. Available: https://github.com/johnjohnlin/nicotb 44
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top