跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.109) 您好!臺灣時間:2026/04/20 02:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳奕亨
研究生(外文):Yi-Heng Wu
論文名稱:利用向量量化壓縮卷積神經網路之實作及加速器設計
論文名稱(外文):Compressing Convolutional Neural Network by VectorQuantization : Implementation and Accelerator Design
指導教授:簡韶逸
指導教授(外文):Shao-Yi Chien
口試日期:2017-07-21
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電子工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:47
中文關鍵詞:卷積神經網路向量量化加速器
外文關鍵詞:Convolutional Neural NetworkVector QuantizationAccelerator
相關次數:
  • 被引用被引用:0
  • 點閱點閱:429
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
近年來,卷積神經網路(Convolutional Neural Network)在許多電腦視覺相關的領域都有突破性的成果,但由於其龐大的運算量和硬體需求,主要都是在圖形處理器上運行,無法有效的在行動裝置等小型裝置上實作。而在卷積神經網路中,又以卷積層(Convolutional layer)和完全連接層(Fully-connected layer)所需要的硬體資源最多。

因此在本論文中,我們針對這兩種網路層進行加速,我們的方法可分為兩個部分。我們首先利用向量量化(Vector Quantization)演算法來壓縮卷積神經網路,實作結果顯示我們可將原本龐大的卷積神經網路壓縮到原本的10\%以下,同時減少三到四倍的運算量。

接著針對壓縮後的卷積神經網路,我們設計硬體來加速同時減少他所需的記憶體讀取量。其中包含利用不同的記憶體讀取計畫來減少記憶體的讀取量,以及利用引索儲存器(Index buffer)來增加運算單元的使用率。

我們的方法同時優化了卷積層和完全連接層執行,並且透過設計空間探索(Design Space Exploration)以及軟體模擬,我們的設計達到比現今的卷積神經網路加速器更低的記憶體讀取量以及更快的速度。
In recent years, deep convolutional neural networks~(CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional~(CONV) and fully connected~(FC) layers. In this thesis, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access, subsequently reduce power consumption. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to state-of-the-art accelerators for CNN, the proposed architecture achieves 1.2--5x less DRAM access and 1.5--5x higher throughput for both CONV and FC layers.
摘要(iii)
Abstract(v)
1. Introduction(1)
1.1 Convolutional Neural Network(1)
1.2 Motivation(1) 

1.3 Hardware Design Challenge(2) 

1.4 Contribution(3) 

1.5 Thesis Organization(4) 

2. Background and Related Works(5)
2.1 Layers(5)
2.2 Neural Network Model Refinement(7)
2.2.1 Network Pruning(7) 

2.2.2 Quantization(7) 

2.2.3 Training from scratch(8) 

2.2.4 Summary(10) 

2.3 Neural Network accelerators (10)
2.3.1 Diannao(ASPLOS’14)[1] (10)
2.3.2 Eyeriss(ISCA’16)[2] (12)
2.3.3 Accelerators for compressed network (15)
3. Vector Quantization(17)
3.1 Introduction(17)
3.2 Comparison(17)
3.3 Vector Quantization(18)
3.4 Testing on Vector Quanitzed Layer(20)
3.5 Error Correction(21)
3.6 Experiment(24)
4. Architecture design(27) 

4.1 Introduction(27)
4.2 Architecture Overview(28)
4.3 DRAM Access Scheme(28)
4.4 Processing Element (PE) (32) 

5. Design space exploration (DSE) (35) 

5.1 Specification(35)
5.2 DRAM Access Analysis(35)
5.3 Computation Time Analysis(38) 

6. Conclusion(41)
Bibliography(43)
[1] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in ACM Sigplan Notices, vol. 49, no. 4. ACM, 2014, pp. 269–284. 

[2] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient re- configurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017. 

[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing sys- tems, 2012, pp. 1097–1105. 

[4] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in Neural Information Processing Systems, 2015, pp. 1135–1143. 

[5] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016. 

[6] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: efficient inference engine on compressed deep neural network,” in Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 2016, pp. 243–254. 

[7] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828. 

[8] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 

[9] R.Girshick,J.Donahue,T.Darrell,andJ.Malik,“Richfeaturehierarchiesforaccu- rate object detection and semantic segmentation,” in Proceedings of the IEEE con- ference on computer vision and pattern recognition, 2014, pp. 580–587. 

[10] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654. 

[11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680. 

[12] K.He,X.Zhang,S.Ren,andJ.Sun,“Deepresiduallearningforimagerecognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. 

[13] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” 2017. 

[14] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in ACM SIGARCH Computer Architecture News, vol. 43, no. 3. ACM, 2015, pp. 92–104. 

[15] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., “Dadiannao: A machine-learning supercomputer,” in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Com- puter Society, 2014, pp. 609–622. 

[16] B. Hassibi and D. G. Stork, “Second order derivatives for network pruning: Optimal brain surgeon,” in Advances in neural information processing systems, 1993, pp. 164–171. 

[17] S.SrinivasandR.V.Babu,“Data-freeparameterpruningfordeepneuralnetworks,” arXiv preprint arXiv:1507.06149, 2015. 

[18] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015. 

[19] Y. Sun, X. Wang, and X. Tang, “Sparsifying neural network connections for face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2016, pp. 4856–4864. 

[20] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 2016, pp. 1–12. 

[21] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: ineffectual-neuron-free deep neural network computing,” in Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 2016, pp. 1–13. 

[22] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1737–1746. 

[23] S.Anwar,K.Hwang,andW.Sung,“Fixedpointoptimizationofdeepconvolutional neural networks for object recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 1131–1135. 

[24] S.-Y.C.Bin-SyhYu,“Architecturedesignofconvolutionalneuralnetworksforface detection on fpga platforms,” in Master Thesis. National Taiwan University, 2016. 

[25] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet clas- sification using binary convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 525–542. 

[26] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting lin- ear structure within convolutional networks for efficient evaluation,” in Advances in Neural Information Processing Systems, 2014, pp. 1269–1277. 

[27] W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen, “Compressing neural networks with the hashing trick,” in International Conference on Machine Learning, 2015, pp. 2285–2294. 

[28] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. 

[29] B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández- Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proceedings of the 43rd International Sympo- sium on Computer Architecture. IEEE Press, 2016, pp. 267–278. 

[30] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “Neu- flow: A runtime reconfigurable dataflow processor for vision,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Con- ference on. IEEE, 2011, pp. 109–116. 

[31] D.Liu,T.Chen,S.Liu,J.Zhou,S.Zhou,O.Teman,X.Feng,X.Zhou,andY.Chen, “Pudiannao: A polyvalent machine learning accelerator,” in ACM SIGARCH Com- puter Architecture News, vol. 43, no. 1. ACM, 2015, pp. 369–381. 

[32] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014. 

[33] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” arXiv preprint arXiv:1704.04760, 2017.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊