臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.109) 您好！臺灣時間：2026/04/20 02:17

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

吳奕亨

研究生(外文):

Yi-Heng Wu

論文名稱:

利用向量量化壓縮卷積神經網路之實作及加速器設計

論文名稱(外文):

Compressing Convolutional Neural Network by VectorQuantization : Implementation and Accelerator Design

指導教授:

簡韶逸

指導教授(外文):

Shao-Yi Chien

口試日期:

2017-07-21

學位類別:

碩士

校院名稱:

國立臺灣大學

系所名稱:

電子工程學研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2017

畢業學年度:

105

語文別:

英文

論文頁數:

中文關鍵詞:

卷積神經網路、向量量化、加速器

外文關鍵詞:

Convolutional Neural Network、Vector Quantization、Accelerator

相關次數:

被引用:0
點閱:429
評分:
下載:0
書目收藏:1

近年來，卷積神經網路(Convolutional Neural Network)在許多電腦視覺相關的領域都有突破性的成果，但由於其龐大的運算量和硬體需求，主要都是在圖形處理器上運行，無法有效的在行動裝置等小型裝置上實作。而在卷積神經網路中，又以卷積層(Convolutional layer)和完全連接層(Fully-connected layer)所需要的硬體資源最多。

因此在本論文中，我們針對這兩種網路層進行加速，我們的方法可分為兩個部分。我們首先利用向量量化(Vector Quantization)演算法來壓縮卷積神經網路，實作結果顯示我們可將原本龐大的卷積神經網路壓縮到原本的10\%以下，同時減少三到四倍的運算量。

接著針對壓縮後的卷積神經網路，我們設計硬體來加速同時減少他所需的記憶體讀取量。其中包含利用不同的記憶體讀取計畫來減少記憶體的讀取量，以及利用引索儲存器(Index buffer)來增加運算單元的使用率。

我們的方法同時優化了卷積層和完全連接層執行，並且透過設計空間探索(Design Space Exploration)以及軟體模擬，我們的設計達到比現今的卷積神經網路加速器更低的記憶體讀取量以及更快的速度。

In recent years, deep convolutional neural networks~(CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional~(CONV) and fully connected~(FC) layers. In this thesis, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access, subsequently reduce power consumption. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to state-of-the-art accelerators for CNN, the proposed architecture achieves 1.2--5x less DRAM access and 1.5--5x higher throughput for both CONV and FC layers.

摘要(iii)
Abstract(v)
1. Introduction(1)
1.1 Convolutional Neural Network(1)
1.2 Motivation(1)  
1.3 Hardware Design Challenge(2)  
1.4 Contribution(3)  
1.5 Thesis Organization(4)  
2. Background and Related Works(5)
2.1 Layers(5)
2.2 Neural Network Model Refinement(7)
2.2.1 Network Pruning(7)  
2.2.2 Quantization(7)  
2.2.3 Training from scratch(8)  
2.2.4 Summary(10)  
2.3 Neural Network accelerators (10)
2.3.1 Diannao(ASPLOS’14)[1] (10)
2.3.2 Eyeriss(ISCA’16)[2] (12)
2.3.3 Accelerators for compressed network (15)
3. Vector Quantization(17)
3.1 Introduction(17)
3.2 Comparison(17)
3.3 Vector Quantization(18)
3.4 Testing on Vector Quanitzed Layer(20)
3.5 Error Correction(21)
3.6 Experiment(24)
4. Architecture design(27)  
4.1 Introduction(27)
4.2 Architecture Overview(28)
4.3 DRAM Access Scheme(28)
4.4 Processing Element (PE) (32)  
5. Design space exploration (DSE) (35)  
5.1 Specification(35)
5.2 DRAM Access Analysis(35)
5.3 Computation Time Analysis(38)  
6. Conclusion(41)
Bibliography(43)

[1] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in ACM Sigplan Notices, vol. 49, no. 4. ACM, 2014, pp. 269–284.  
[2] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient re- configurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.  
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing sys- tems, 2012, pp. 1097–1105.  
[4] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in Neural Information Processing Systems, 2015, pp. 1135–1143.  
[5] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.  
[6] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: efficient inference engine on compressed deep neural network,” in Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 2016, pp. 243–254.  
[7] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828.  
[8] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.  
[9] R.Girshick,J.Donahue,T.Darrell,andJ.Malik,“Richfeaturehierarchiesforaccu- rate object detection and semantic segmentation,” in Proceedings of the IEEE con- ference on computer vision and pattern recognition, 2014, pp. 580–587.  
[10] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.  
[11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.  
[12] K.He,X.Zhang,S.Ren,andJ.Sun,“Deepresiduallearningforimagerecognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.  
[13] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” 2017.  
[14] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in ACM SIGARCH Computer Architecture News, vol. 43, no. 3. ACM, 2015, pp. 92–104.  
[15] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., “Dadiannao: A machine-learning supercomputer,” in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Com- puter Society, 2014, pp. 609–622.  
[16] B. Hassibi and D. G. Stork, “Second order derivatives for network pruning: Optimal brain surgeon,” in Advances in neural information processing systems, 1993, pp. 164–171.  
[17] S.SrinivasandR.V.Babu,“Data-freeparameterpruningfordeepneuralnetworks,” arXiv preprint arXiv:1507.06149, 2015.  
[18] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.  
[19] Y. Sun, X. Wang, and X. Tang, “Sparsifying neural network connections for face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2016, pp. 4856–4864.  
[20] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 2016, pp. 1–12.  
[21] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: ineffectual-neuron-free deep neural network computing,” in Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 2016, pp. 1–13.  
[22] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1737–1746.  
[23] S.Anwar,K.Hwang,andW.Sung,“Fixedpointoptimizationofdeepconvolutional neural networks for object recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 1131–1135.  
[24] S.-Y.C.Bin-SyhYu,“Architecturedesignofconvolutionalneuralnetworksforface detection on fpga platforms,” in Master Thesis. National Taiwan University, 2016.  
[25] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet clas- sification using binary convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 525–542.  
[26] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting lin- ear structure within convolutional networks for efficient evaluation,” in Advances in Neural Information Processing Systems, 2014, pp. 1269–1277.  
[27] W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen, “Compressing neural networks with the hashing trick,” in International Conference on Machine Learning, 2015, pp. 2285–2294.  
[28] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.  
[29] B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández- Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proceedings of the 43rd International Sympo- sium on Computer Architecture. IEEE Press, 2016, pp. 267–278.  
[30] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “Neu- flow: A runtime reconfigurable dataflow processor for vision,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Con- ference on. IEEE, 2011, pp. 109–116.  
[31] D.Liu,T.Chen,S.Liu,J.Zhou,S.Zhou,O.Teman,X.Feng,X.Zhou,andY.Chen, “Pudiannao: A polyvalent machine learning accelerator,” in ACM SIGARCH Com- puter Architecture News, vol. 43, no. 1. ACM, 2015, pp. 369–381.  
[32] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.  
[33] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” arXiv preprint arXiv:1704.04760, 2017.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	基於雙卷積器架構而具有高運算資源使用率之卷積神經網路硬體加速器設計
2.	卷積神經網路定點數量化技術與加速器之軟性錯誤分析
3.	深度卷積神經網路加速器之設計探索方法
4.	使用快速傅立葉轉換之高效節能卷積神經網路加速器
5.	高效能頻寬使用之AI深度學習硬體加速架構設計與實現
6.	考量硬體繞線複雜度之精簡儲存列駐式CNN加速設計
7.	基於壓縮式卷積神經網路之推論加速器設計
8.	適用於卷積神經網路應用之高精準度高效益靜態浮點數運算外積陣列處理器
9.	應用於卷積類神經網路之具能源效益加速器及資料處理流程
10.	具精簡儲存及可重組式資料前饋網路之列駐式CNN加速器
11.	針對深度卷積神經網路之具彈性加速器設計
12.	適用於卷積類神經網路之高效率硬體加速器設計
13.	考量既定資源限制之卷積神經網路硬體設計取捨

無相關期刊

1.	使用紅外照明的可靠眼動追蹤演算法及FPGA實現
2.	基於階層近似演算法的雙邊濾波器及中位數濾波器之可重置化超大型積體電路設計
3.	基於字典學習法之超解析度即時無幀緩衝器系統架構設計
4.	應用於卷積神經網路之可調變式紋理單元
5.	二次拉普拉斯算子正規化求最佳解之理論分析與計算方法
6.	基於向量量化之卷積神經網路處理器架構設計
7.	深度類神經網路於視訊感測網路:量化與分散式運算
8.	針對影片合成應用之三維模型於彩色及深度顯示系統
9.	基於簡單資料擴增學習之快速非共平面人臉偵測
10.	低數值精確度捲積神經網路加速器之可重置化超大型積體電路設計
11.	精準六自由度物體姿態之估測與追蹤
12.	即時性兩階段顯著物偵測演算法及其異質性積體電路架構設計
13.	應用於光電能源截取之電源管理單元
14.	輸電網路故障定位技術之改善研究
15.	背景校正供應電壓雜訊之數位鎖相迴路與注入鎖定時脈倍頻器

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室