(3.236.222.124) 您好!臺灣時間:2021/05/08 07:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林柏翰
研究生(外文):Po-HanLin
論文名稱:一個基於八位元連續漸進逼近式類比數位轉換器且操作於一億赫茲之混訊神經網路加速器
論文名稱(外文):An 8-bit 100-MHz SAR ADC-Based Mixed-Signal Accelerator for Neural Networks
指導教授:張順志
指導教授(外文):Soon-Jyh Chang
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:109
中文關鍵詞:混訊加速器類比運算乘加累積神經網路連續漸進逼近式類比數位轉換器
外文關鍵詞:mixed-signal acceleratoranalog computationmultiply-accumulate (MAC)neural networkssuccessive-approximation register (SAR)analog-to-digital converter (ADC)
相關次數:
  • 被引用被引用:1
  • 點閱點閱:32
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出一個基於八位元連續漸進逼近式類比數位轉換器且操作於一億赫茲之混訊神經網路加速器。除了量化神經網路中的激活(activation) 跟權重(weight)之外,此加速器採用類比式運算進一步降低神經網路中算術運算所需的能量。此外,為了提升神經網路的top-1準確率,本論文提出一個執行乘加累積運算的5階段切換方式去減少動態偏移誤差(dynamic offset)。最後,為了將類比的乘積累加運算結果量化成數位輸出碼,一個連續漸進逼近式類比數位轉換器被整合進加速器中。
本設計以台積電40奈米CMOS標準1P9M製程實作測試晶片。晶片面積佔 2.613 mm2,其中核心電路占整體的25.8%。在0.9伏特電源供電且一億赫茲時脈下,本設計在MNIST 與CIFAR10資料庫的top1準確率可分別達99.3%與86%,而能量效率為3.3TOPS/W,將每一算術運算能量除以類比數位轉換器的輸出階層數做為指標下,加速器在此操作情況下可達1.18 fJ/step,將加速器用0.7伏特電源供電且操作於八千萬赫茲時脈下,可進一步提升能量效率與指標,在此操作條件下,MNIST 與CIFAR10資料庫的top1準確率可分別維持在99.3%與85.3%,而能量效率與指標則分別為6.34 TOPS/W 與 0.62fJ/step。
This thesis presents an 8-bit 100-MHz SAR ADC-based mixed-signal accelerator for neural networks. In addition to quantizing the activations and weights in neural networks, the analog computation is adopted in the accelerator to further reduce the energy consumption per arithmetic operation. Moreover, in order to enhance the top-1 accuracies of neural networks, a 5-phase switching scheme, which performs the multiply-accumulate (MAC) operation, is proposed to mitigate the dynamic offset. Last but not least, a successive-approximation register (SAR) analog-to-digital converter (ADC) is incorporated into the proposed accelerator to quantize the analog multiply-accumulate signal into the digital output code.
The proof-of-concept prototype was fabricated in TSMC 40-nm CMOS standard 1P9M process, where the chip occupies 2.613 mm2, and the core circuit accounts for 25.8% of the total area. With 100-MHz clock frequency and 0.9-V supply voltage, the design achieves the top-1 accuracies of 99.3% and 87.3% on MNIST and CIFAR10 datasets, respectively. In addition, the energy efficiency of 3.3TOPS/W is attained, and the figure of merit (FOM), i.e. the energy consumption per arithmetic operation normalized to the quantization steps of the ADC output, is 1.18 fJ/step. To achieve better energy efficiency and FOM, the prototype is operated with 80-MHz clock frequency and 0.7-V supply voltage. In this case, the top-1 accuracies on MNIST and CIFAR10 datasets are 99.3% and 86%, respectively. The energy efficiency and FOM are 6.34 TOPS/W and 0.62fJ/step, respectively.
Table of Contents
摘 要 I
Abstract II
List of Tables IX
List of Figures X
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Thesis Organization 5
Chapter 2 Basics of Neural Networks 6
2.1 Basics of Deep Neural Network (DNN) 7
2.2 Overview of CNN 10
2.2.1 Basics of CNN 10
2.2.2 Non-Linear Activation Function 13
2.2.3 Pooling Function 15
2.2.4 Normalization Function 16
2.3 Quantization of Neural Networks 17
2.3.1 Linear Quantization 18
2.3.2 Non-Linear Quantization 20
2.3.3 Quantization-Aware Training Technique[24] 22
Chapter 3 Fundamentals of SAR ADC 24
3.1 Building Block of SAR ADC 25
3.1.1 Behavioral Operation of SAR ADC 26
3.1.2 Circuit-level Operation of SAR ADC 29
3.2 Quantization Error 32
3.3 Static Specifications 35
3.3.1 Offset Error 35
3.3.2 Gain error 36
3.3.3 Nonlinearity 38
3.3.3.1 Differential nonlinearity error 38
3.3.3.2 Integral Nonlinearity 40
3.4 Dynamic Specifications 43
3.4.1 Spurious-Free Dynamic Range 43
3.4.2 Signal-to-Noise Ratio 44
3.4.3 Signal-to-Noise and Distortion Ratio 46
3.4.4 Total Harmonic Distortion 47
3.4.5 Effective Number of Bits 48
3.4.6 Effective Resolution Bandwidth 48
3.4.7 Figure of Merit 49
Chapter 4 An 8-bit 100-MHz SAR ADC-Based Accelerator for Neural Networks 50
4.1 Introduction of Overall Architecture 51
4.2 Multiply-Accumulate Unit 52
4.2.1 Passive Digital-to-Analog Multiplier Circuit [15] [30] 53
4.2.2 Systematic Offsets from Parasitic Capacitances 56
4.2.3 Proposed 5-phase Switching Scheme 61
4.3 Adopted Techniques of SAR ADC 66
4.3.1 Merged Capacitor Switching Method [31] 66
4.3.2 Direct Switching technique [34] and Compact Combinational Timing Control [35] 69
4.4 Circuit Realization 71
4.4.1 Phase Generator 71
4.4.2 MAC Control Logic 73
4.4.3 Dynamic Comparator 75
4.4.4 Capacitive DAC 77
Chapter 5 Simulation and Measurement Results 81
5.1 Layout and Chip Floor Plan 81
5.2 Simulation Results 85
5.3 Design Consideration for PCB 92
5.4 Die Micrograph and Measurement Setup 95
5.5 Measurement Results 97
Chapter 6 Conclusions and Future Works 103
Bibliography 106
[1]D. Silver et al., “Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), pp. 484-489, 2016.
[2]K. He et al., “Deep Residual Learning for Image Recognition. CVPR. arXiv preprint arXiv:1512.03385, 2016.
[3]Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint:1609.07061, 2016.
[4]Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint:1606.06160, 2016.
[5]M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. XNOR-Net: Imagenet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279, 2016. 1, 2.
[6]I. Hubara, et al., “Binarized Neural Networks. NIPS. arXiv preprint arXiv:1602.02505, 2016.
[7]Q. Dong et al., 15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications, in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 242-244.
[8]E. A. Vittoz, Future of analog in the VLSI environment, in IEEE International Symposium on Circuits and Systems, 1990, pp. 1372-1375 vol.2.
[9]B. E. Boser, E. Sackinger, J. Bromley, Y. LeCun, R. E. Howard, and L. D. Jackel, An analog neural network processor and its application to high-speed character recognition, in IJCNN-91-Seattle International Joint Conference on Neural Networks, 1991, vol. i, pp. 415-420 vol.1.
[10]P. Masa et al., 10 mW CMOS retina and classifier for handheld, 1000 images/s optical character recognition system, in 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278), 1999, pp. 204-205.
[11]J. Lu, S. Young, I. Arel, and J. Holleman, 30.10 A 1TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13μm CMOS, in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 504-505.
[12]C. Xue et al., 15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices, in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 244-246.
[13]X. Si et al., 15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips, in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 246-248.
[14]K. Watanabe and G. Temes, A switched-capacitor multiplier/divider with digital and analog outputs, IEEE Transactions on Circuits and Systems, vol. 31, no. 9, pp. 796-800, 1984.
[15]D. Bankman and B. Murmann, An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS, in 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2016, pp. 21-24.
[16]X Glorot, A Bordes, Y Bengio,Deep Sparse Rectifier Neural Networks, in Proceedings of the 14th International Conference on Artifical Intekigence and Statistics 2011, Fort Lauderdale, FL, USA(2011).
[17]A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models, in ICML,2013.
[18]D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), ICLR, 2016.
[19]X. Zhang, J. Trmal, D. Povey, and S. Khudanpur, Improving deep neural network acoustic models using generalized maxout networks, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 215-219.
[20]S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift, in ICML, 2015.
[21]M. Yufei, N. Suda, C. Yu, J. Seo, and S. Vrudhula, Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016, pp. 1-8.
[22]E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, LogNet: Energy-efficient neural networks using logarithmic computation, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5900-5904.
[23]H. M. S. Han, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, in ICLR, 2016.
[24]R. Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: A whitepaper, CoRR, vol. abs/1806.08342, 2018.
[25]A. Y. A. Zhou, Y. Guo, L. Xu, and Y. Chen, “Incremental Network Quantization: Towards Lossless CNNs with Lowprecision Weights, in ICLR, 2017.
[26]M. Horowitz, 1.1 Computing's energy problem (and what we can do about it), in 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 10-14.
[27]V. Sze, Y. Chen, T. Yang, and J. S. Emer, Efficient Processing of Deep Neural Networks: A Tutorial and Survey, Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017.
[28]B. Murmann, “ADC Performance Survey 1997-2018, [Online]. Available: http://web.stanford.edu/~murmann/adcsurvey.html.
[29]J. L. McCreary and P. R. Gray, All-MOS charge redistribution analog-to-digital conversion techniques. I, IEEE Journal of Solid-State Circuits, vol. 10, no. 6, pp. 371-379, 1975.
[30]D. Bankman and B. Murmann, Passive charge redistribution digital-to-analogue multiplier, Electronics Letters, vol. 51, no. 5, pp. 386-388, 2015.
[31]V. Hariprasath, J. Guerber, S. Lee, and U. Moon, Merged capacitor switching based SAR ADC with highest switching energy-efficiency, Electronics Letters, vol. 46, no. 9, pp. 620-621, 2010.
[32]Y. Zhu et al., A 10-bit 100-MS/s Reference-Free SAR ADC in 90 nm CMOS, IEEE Journal of Solid-State Circuits, vol. 45, no. 6, pp. 1111-1121, 2010.
[33]C. Liu, S. Chang, G. Huang, and Y. Lin, A 10-bit 50-MS/s SAR ADC With a Monotonic Capacitor Switching Procedure, IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 731-740, 2010.
[34]G. Huang, S. Chang, Y. Lin, C. Liu, and C. Huang, A 10b 200MS/s 0.82mW SAR ADC in 40nm CMOS, in 2013 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2013, pp. 289-292.
[35]A 10-bit 120-MS/s SAR ADC with Compact Architecture and Noise Suppression Technique 哲勳, 郭. (Author). 2014 Aug 22.
[36]C. Liu, C. Kuo, and Y. Lin, A 10 bit 320 MS/s Low-Cost SAR ADC for IEEE 802.11ac Applications in 20 nm CMOS, IEEE Journal of Solid-State Circuits, vol. 50, no. 11, pp. 2645-2654, 2015.
[37]X. Si et al., 24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning, in 2019 IEEE International Solid- State Circuits Conference - (ISSCC), 2019, pp. 396-398.
[38]E. H. Lee and S. S. Wong, 24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm, in 2016 IEEE International Solid-State Circuits Conference (ISSCC), 2016, pp. 418-419.
[39]J. Su et al., 15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips, in 2020 IEEE International Solid- State Circuits Conference - (ISSCC), 2020, pp. 240-242.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔