臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.109) 您好！臺灣時間：2026/04/20 02:18

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

林建宇

研究生(外文):

Lin, Chien-Yu

論文名稱:

Merlin: 一個有效利用神經元及權重稀疏性的類神經網路加速器設計

論文名稱(外文):

Merlin: A Sparse Neural Network Accelerator Utilizing Both Neuron and Weight Sparsity

指導教授:

賴伯承

指導教授(外文):

Lai, Bo-Cheng

口試委員:

楊佳玲、張添烜、賴伯承

口試委員(外文):

Yang, Chia-Lin、Chang, Tian Sheuan、Lai, Bo-Cheng

口試日期:

2017-05-25

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電子研究所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2017

畢業學年度:

105

語文別:

英文

論文頁數:

中文關鍵詞:

深度類神經網路、加速器、稀疏性

外文關鍵詞:

Deep neural network、Accelerator、Sparsity

相關次數:

被引用:0
點閱:324
評分:
下載:6
書目收藏:0

稀疏性在當前具代表性的深層類神經網路中被廣泛的發現，因為大量的神經元或權重可為零卻不影響精準度。在類神經網路加速器當中，藉由稀疏格式來儲存資料，可以節省大量的資料傳輸量，並可以透過跳掉不必要的運算來達到加速的效果。但是在稀疏格式中，非零值的分布是不規則的，這會使得存取特定資料變得複雜，也是使用稀疏格式的一大考量。因此，目前已經發表的深度網路加速器中大部分仍然使用非稀疏的格式，或是只著力在神經元或是權重兩者其中一種的稀疏性而已。本論文題出一個加速器: Merlin 可以同時利用神經元跟權重的稀疏性，其中我們採用一個簡單卻高效的一維稀疏格式，並且我們設計了一個高效的雙索引模組，來從一對稀疏資料中找到共同非零的數值。除了稀疏性，我們同時也發現運算資料流所導致的不同資料的使用優先順序也會對效能造成極大的影響，所以本論文也對此加速器能夠採用的兩種運算資料流進行了詳盡的分析，並且展示合適的運算資料流能夠進一步提升運算的效率。
Merlin的實做用40奈米的製程合成，跟一個先進的稀疏網路加速器比較之下，Merlin僅多使用4.88%的面積；而採用較高效資料流的Merlin則可以減少17.03倍的主記憶體存取，以及減少14.46倍的能源消耗，還有著88.55倍的EDP改進。

Sparsity is widely observed in state-of-the-art deep neural networks (DNNs) by zeroing a large portion of both neurons and weights without impairing the result quality of DNNs. By storing the sparse DNN in a sparse format, the reduced data footprint could considerably cut down data movement in a DNN accelerator. The computation of ineffectual neuron/weight pairs could also be skipped to speedup the data processing. However, the irregular distribution of non-zero data complicates the indexing scheme to identify non-zero elements, and becomes one of the main design concerns. Most of existing DNN accelerators either focused on dense neural network or chose to only support the weight sparsity known at compile time. In this thesis, we propose Merlin, a DNN accelerator targeting at taking advantage of both neuron and weight sparsity. A simple and yet effective one dimensional sparse format is used to maintain both neurons and weights. A Dual Indexing Module (DIM) is introduced to skip the ineffectual neuron/weight pairs and only forward the effectual pairs for actual computation. Beside sparsity, we also observe that the computation dataflow of DNN execution has huge performance impact, and needs to be considered in order to attain superior performance. We have conducted comprehensive analysis for the computation dataflow on a sparse DNN, and showed that a proper computation dataflow could further facilitate the efficient execution of sparse DNNs.
The implementation of Merlin is synthesized with 40nm technology. When compared with a state-of-the-art DNN accelerator which only supports weight sparsity, Merlin with proper computation dataflow has achieved up to 17.03x, 14.46x and 88.55x reduction respectively on DRAM access, energy consumption and Energy Delay Product (EDP) for executing the sparse version of AlexNet with only 4.88% area overhead.

Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Basics of Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The Neuron Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Weight Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 4 Dual Index Module: Exploiting Sparsity of Neurons andWeights . . 12
4.1 Direct Indexing on both of the Neurons and Weights . . . . . . . . . . . . . . . 13
4.2 Architecture of Dual Index Module . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 5 The Merlin Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1 Baseline Sparse DNN Accelerator: Cambricon-X . . . . . . . . . . . . . . . . 18
5.2 Merlin Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 6 Analysis of Computation Dataflow . . . . . . . . . . . . . . . . . . . 26
6.1 Lane Oriented Output Stationary (LO-OS) . . . . . . . . . . . . . . . . . . . . 29
6.2 Brick Oriented Output Stationary (BO-OS) . . . . . . . . . . . . . . . . . . . . 30
6.3 Comparison between LO-OS and BO-OS . . . . . . . . . . . . . . . . . . . . 31
Chapter 7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.1 Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[2] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale
video classification with convolutional neural networks,” IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 1752–1732, 2014.
[3] G. Hinton, L. Deng, D. Yu, G. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Van-
houcke, P. Nguyen, T. Sainath, and B. Kingsbury, “Deep neural networks for acoustic
modeling in speech recognition,” Signal Processing Magazine, 2012.
[4] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neu-
ral networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Jun 2016.
[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to doc-
ument recognition,” pp. 2278–2324, IEEE, 1998.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con-
volutional neural networks,” in Advances in Neural Information
(F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), pp. 1097–1105, Cur-
ran Associates, Inc., 2012.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” International Conference on Learning Representations (ICLR), 2014.
[8] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy, “Sparse convolutional neu-
ral networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2015.
[9] E. L. Denton,W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure
within convolutional networks for efficient evaluation,” in Advances in Neural Information
Processing Systems 27 (Z. Ghahramani, M.Welling, C. Cortes, N. D. Lawrence, and K. Q.
Weinberger, eds.), pp. 1269–1277, 2014.
[10] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks
using vector quantization,” 2014.
[11] J. Albericio, P. Judd, T. H. Hetherington, T. M. Aamodt, N. D. E. Jerger, and A. Moshovos,
“Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in 43rd ACM/IEEE
Annual International Symposium on Computer Architecture (ISCA), pp. 1–13, 2016.
[12] S. Han, J. Pool, J. Tran, andW. Dally, “Learning both weights and connections for efficient
neural network,” in Advances in Neural Information Processing Systems (NIPS), pp. 1135–
1143, 2015.
[13] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, andW. J. Dally, “Eie: Efficient
inference engine on compressed deep neural network,” 2016.
[14] L. Z. H. L. S. L. L. L. Q. G. T. C. Shijin Zhang, Zidong Du and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” 49th IEEE/ACM International Symposium on
Microarchitecture (MICRO), 2016.
[15] T. Chen, Z. Du, N. Sun, J.Wang, C.Wu, and O. T. Yunji Chen, “Diannao: a small-footprint
high-throughput accelerator for ubiquitous machine-learning,” pp. 269–284, 2014.
[16] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks
with pruning, trained quantization and huffman coding,” International Conference
on Learning Representations (ICLR), 2016.
[17] J. E. Yu-Hsin Chen and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator
for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits (JSSC),
2017.
[18] S. L. S. Z. L. H. J. W. L. L. T. C. Z. X. N. S. Yunji Chen, Tao Luo and O. Temam,
“Dadiannao: A machine-learning supercomputer,” 2014.
[19] G. S. Y. G. B. X. Chen Zhang, Peng Li and J. Cong, “Optimizing fpga-based accelerator
design for deep convolutional neural networks,” pp. 161–170, 2015.
[20] S. Y. K. G. B. L. Jiantao Qiu1, Jie Wang, T. T. N. X. S. S. Y. W. Erjin Zhou, Jincheng Yu,
and H. Yang, “Going deeper with embedded fpga platform for convolutional neural network,”
pp. 26–35, 2016.
[21] J. E. Yu-Hsin Chen and V. Sze, “Eyeriss: A spatial architecture for energy-efficient
dataflow for convolutional neural networks,” 2016.
[22] N. P. J. Naveen Muralimanohar, Rajeev Balasubramonian, “Cacti 6.0: A tool to model
large caches,” HP Laboratories, 2009.
[23] J. Deng, W. Dong, R. Socher, L. jia Li, K. Li, and L. Fei-fei, “ hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2009.
[24] S. Han, “Deep-compression-alexnet,” 2016.
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in Computer Vision and Pattern
Recognition (CVPR), 2015.
[26] S. Y. Min Lin, Qiang Chen, “Network in network,” in Computer Vision and Pattern Recognition
(CVPR), 2014.
[27] T. D. Jonathan Long, Evan Shelhamer, “Fully convolutional models for semantic segmentation,” in Computer Vision and Pattern Recognition (CVPR), 2015.
[28] R. Gonzalez and M. Horowitz, “Energy dissipation in general purpose microprocessors, ieee journal of solid-state circuits,” vol.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	使用晶片內網路連線的樂高式深度類神經網路處理器架構設計方法
2.	應用於深度神經網絡資料處理以基於時域脈衝邊緣6T靜態隨機存取記憶體之記憶體內運算結構
3.	基於記憶體內運算架構之多巨集稀疏化感知深度學習硬體加速器

無相關期刊

1.	針對稀疏類神經卷積網路的分析和優化方法
2.	針對多埠記憶體演算法：技術與設計之權衡效能分析
3.	針對稀疏卷積神經網路增強處理引擎使用率之類單指令流多資料流加速器
4.	稀疏類神經網路在分散式系統的設計與分析
5.	卷積神經網路於現代繪圖處理器架構之設計探索
6.	可程式化閘陣列之高效率多埠演算法記憶體設計
7.	以嵌入式異質多核心平台建構分散式計算之系統設計與優化: 以奇異值分解為例
8.	個案研究：現代通用圖形處理器之卷積類神經網路之軟體與硬體改進
9.	高效率多寫埠之查表式演算法記憶體設計
10.	針對基因序列比對的K-mer匹配之雜湊表演算法以及在通用式圖形處理器上的實現
11.	基於雙流生成網路的擋風玻璃影像強化器
12.	針對具有雜訊長基因序列之圖形處理器加速,管線化與多執行緒的基因序列比對架構設計
13.	考通量理器中的行之工作群映射快取管理制
14.	針對 Hadoop 在嵌入式異質多核心平台之低功耗設計流程
15.	基於數據分佈特性在可程式化邏輯閘陣列上的排序架構設計

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室