跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.109) 您好!臺灣時間:2026/04/20 02:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林建宇
研究生(外文):Lin, Chien-Yu
論文名稱:Merlin: 一個有效利用神經元及權重稀疏性的類神經網路加速器設計
論文名稱(外文):Merlin: A Sparse Neural Network Accelerator Utilizing Both Neuron and Weight Sparsity
指導教授:賴伯承
指導教授(外文):Lai, Bo-Cheng
口試委員:楊佳玲張添烜賴伯承
口試委員(外文):Yang, Chia-LinChang, Tian SheuanLai, Bo-Cheng
口試日期:2017-05-25
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電子研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:48
中文關鍵詞:深度類神經網路加速器稀疏性
外文關鍵詞:Deep neural networkAcceleratorSparsity
相關次數:
  • 被引用被引用:0
  • 點閱點閱:324
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
稀疏性在當前具代表性的深層類神經網路中被廣泛的發現,因為大量的神經元或權重可為零卻不影響精準度。在類神經網路加速器當中,藉由稀疏格式來儲存資料,可以節省大量的資料傳輸量,並可以透過跳掉不必要的運算來達到加速的效果。但是在稀疏格式中,非零值的分布是不規則的,這會使得存取特定資料變得複雜,也是使用稀疏格式的一大考量。因此,目前已經發表的深度網路加速器中大部分仍然使用非稀疏的格式,或是只著力在神經元或是權重兩者其中一種的稀疏性而已。本論文題出一個加速器: Merlin 可以同時利用神經元跟權重的稀疏性,其中我們採用一個簡單卻高效的一維稀疏格式,並且我們設計了一個高效的雙索引模組,來從一對稀疏資料中找到共同非零的數值。除了稀疏性,我們同時也發現運算資料流所導致的不同資料的使用優先順序也會對效能造成極大的影響, 所以本論文也對此加速器能夠採用的兩種運算資料流進行了詳盡的分析,並且展示合適的運算資料流能夠進一步提升運算的效率。
Merlin的實做用40奈米的製程合成,跟一個先進的稀疏網路加速器比較之下,Merlin僅多使用4.88%的面積;而採用較高效資料流的Merlin則可以減少17.03倍的主記憶體存取,以及減少14.46倍的能源消耗,還有著88.55倍的EDP改進。
Sparsity is widely observed in state-of-the-art deep neural networks (DNNs) by zeroing a large portion of both neurons and weights without impairing the result quality of DNNs. By storing the sparse DNN in a sparse format, the reduced data footprint could considerably cut down data movement in a DNN accelerator. The computation of ineffectual neuron/weight pairs could also be skipped to speedup the data processing. However, the irregular distribution of non-zero data complicates the indexing scheme to identify non-zero elements, and becomes one of the main design concerns. Most of existing DNN accelerators either focused on dense neural network or chose to only support the weight sparsity known at compile time. In this thesis, we propose Merlin, a DNN accelerator targeting at taking advantage of both neuron and weight sparsity. A simple and yet effective one dimensional sparse format is used to maintain both neurons and weights. A Dual Indexing Module (DIM) is introduced to skip the ineffectual neuron/weight pairs and only forward the effectual pairs for actual computation. Beside sparsity, we also observe that the computation dataflow of DNN execution has huge performance impact, and needs to be considered in order to attain superior performance. We have conducted comprehensive analysis for the computation dataflow on a sparse DNN, and showed that a proper computation dataflow could further facilitate the efficient execution of sparse DNNs.
The implementation of Merlin is synthesized with 40nm technology. When compared with a state-of-the-art DNN accelerator which only supports weight sparsity, Merlin with proper computation dataflow has achieved up to 17.03x, 14.46x and 88.55x reduction respectively on DRAM access, energy consumption and Energy Delay Product (EDP) for executing the sparse version of AlexNet with only 4.88% area overhead.
Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Basics of Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The Neuron Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Weight Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 4 Dual Index Module: Exploiting Sparsity of Neurons andWeights . . 12
4.1 Direct Indexing on both of the Neurons and Weights . . . . . . . . . . . . . . . 13
4.2 Architecture of Dual Index Module . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 5 The Merlin Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1 Baseline Sparse DNN Accelerator: Cambricon-X . . . . . . . . . . . . . . . . 18
5.2 Merlin Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 6 Analysis of Computation Dataflow . . . . . . . . . . . . . . . . . . . 26
6.1 Lane Oriented Output Stationary (LO-OS) . . . . . . . . . . . . . . . . . . . . 29
6.2 Brick Oriented Output Stationary (BO-OS) . . . . . . . . . . . . . . . . . . . . 30
6.3 Comparison between LO-OS and BO-OS . . . . . . . . . . . . . . . . . . . . 31
Chapter 7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.1 Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[2] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale
video classification with convolutional neural networks,” IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 1752–1732, 2014.
[3] G. Hinton, L. Deng, D. Yu, G. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Van-
houcke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic
modeling in speech recognition,” Signal Processing Magazine, 2012.
[4] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neu-
ral networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Jun 2016.
[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to doc-
ument recognition,” pp. 2278–2324, IEEE, 1998.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con-
volutional neural networks,” in Advances in Neural Information
(F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), pp. 1097–1105, Cur-
ran Associates, Inc., 2012.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” International Conference on Learning Representations (ICLR), 2014.
[8] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy, “Sparse convolutional neu-
ral networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2015.
[9] E. L. Denton,W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure
within convolutional networks for efficient evaluation,” in Advances in Neural Information
Processing Systems 27 (Z. Ghahramani, M.Welling, C. Cortes, N. D. Lawrence, and K. Q.
Weinberger, eds.), pp. 1269–1277, 2014.
[10] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks
using vector quantization,” 2014.
[11] J. Albericio, P. Judd, T. H. Hetherington, T. M. Aamodt, N. D. E. Jerger, and A. Moshovos,
“Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in 43rd ACM/IEEE
Annual International Symposium on Computer Architecture (ISCA), pp. 1–13, 2016.
[12] S. Han, J. Pool, J. Tran, andW. Dally, “Learning both weights and connections for efficient
neural network,” in Advances in Neural Information Processing Systems (NIPS), pp. 1135–
1143, 2015.
[13] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, andW. J. Dally, “Eie: Efficient
inference engine on compressed deep neural network,” 2016.
[14] L. Z. H. L. S. L. L. L. Q. G. T. C. Shijin Zhang, Zidong Du and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” 49th IEEE/ACM International Symposium on
Microarchitecture (MICRO), 2016.
[15] T. Chen, Z. Du, N. Sun, J.Wang, C.Wu, and O. T. Yunji Chen, “Diannao: a small-footprint
high-throughput accelerator for ubiquitous machine-learning,” pp. 269–284, 2014.
[16] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks
with pruning, trained quantization and huffman coding,” International Conference
on Learning Representations (ICLR), 2016.
[17] J. E. Yu-Hsin Chen and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator
for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits (JSSC),
2017.
[18] S. L. S. Z. L. H. J. W. L. L. T. C. Z. X. N. S. Yunji Chen, Tao Luo and O. Temam,
“Dadiannao: A machine-learning supercomputer,” 2014.
[19] G. S. Y. G. B. X. Chen Zhang, Peng Li and J. Cong, “Optimizing fpga-based accelerator
design for deep convolutional neural networks,” pp. 161–170, 2015.
[20] S. Y. K. G. B. L. Jiantao Qiu1, Jie Wang, T. T. N. X. S. S. Y. W. Erjin Zhou, Jincheng Yu,
and H. Yang, “Going deeper with embedded fpga platform for convolutional neural network,”
pp. 26–35, 2016.
[21] J. E. Yu-Hsin Chen and V. Sze, “Eyeriss: A spatial architecture for energy-efficient
dataflow for convolutional neural networks,” 2016.
[22] N. P. J. Naveen Muralimanohar, Rajeev Balasubramonian, “Cacti 6.0: A tool to model
large caches,” HP Laboratories, 2009.
[23] J. Deng, W. Dong, R. Socher, L. jia Li, K. Li, and L. Fei-fei, “ hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2009.
[24] S. Han, “Deep-compression-alexnet,” 2016.
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern
Recognition (CVPR), 2015.
[26] S. Y. Min Lin, Qiang Chen, “Network in network,” in Computer Vision and Pattern Recognition
(CVPR), 2014.
[27] T. D. Jonathan Long, Evan Shelhamer, “Fully convolutional models for semantic segmentation,” in Computer Vision and Pattern Recognition (CVPR), 2015.
[28] R. Gonzalez and M. Horowitz, “Energy dissipation in general purpose microprocessors, ieee journal of solid-state circuits,” vol.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊