|
[1]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proc. Adv. Neural Inf. Syst., vol. 25. 2012, pp. 1097-1105. [2] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015. [3] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 779-788. [4]I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair et al., "Generative Adversarial Nets," in Proc. Adv. Neural Inf. Syst., 2014, pp. 2672-2680. [5]“Acceleration in the AWS Cloud”, Internet: https://www.xilinx.com/products/design-tools/acceleration-zone/aws.html, [Jul. 24, 2018]. [6] P. Norman. et al, “In-Datacenter Performance Analysis of a Tensor Processing Unit TM,” in Proc. 44th Annu. Int. Symp. Comput. Archit. (ISCA), 2016, pp. 1-12. [7]N. Jouppi , “Google Supercharges Machine Learning Tasks with TPU Custom Chip”, Internet: https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html, May 18, 2016 [Jul. 24, 2018]. [8]T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning,” in Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2014, pp. 269–284. [9] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun and O. Temam, “DaDianNao: A Machine-Learning Supercomputer,” in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, Dec. 2014, pp. 609-622. [10]Z. Du et al., “ShiDianNao: Shifting Vision Processing Closer to the Sensor,” in Proc. 42th Annu. Int. Symp. Comput. Archit. (ISCA), 2015, pp. 92-104. [11]Liu, Daofu, et al., “PuDianNao: A Polyvalent Machine Learning Accelerator,” in Proc. 20th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015, pp. 369-381. [12]Zhang, Shijin, et al., “Cambricon-X: An Accelerator for Sparse Neural Networks,” in Proc. 49th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2016, pp. 20. [13]S. Liu et al., “Cambricon: An Instruction Set Architecture for Neural Networks,” in Proc. 43th Annu. Int. Symp. Comput. Archit. (ISCA), 2016, pp. 393-405. [14]“Cambricon Technologies”, Internet: http://www.cambricon.com/index.php?c=page&id=10, [Jul. 24, 2018]. [15]“Deephi Tech”, Internet: http://www.deephi.com/technology.html, [Jul. 24, 2018]. [16]D. Lin, S. Talathi and V. S. Annapureddy, “Fixed Point Quantization of Deep Convolutional Networks,” in Proc. 33th Int. Conf. Mach. Learn. (ICML), 2016, pp. 2849-2858. [17]A. Krizhevsky and G. Hinton, “Learning Multiple Layers of Features from Tiny Images,” in Technical report, Vol. 1. No. 4, University of Toronto, 2009, p. 7. [18]S. Gupta, A. Agrawal, K. Gopalakrishnan and P. Narayanan, “Deep Learning with Limited Numerical Precision,” in Proc. 32th Int. Conf. Mach. Learn. (ICML), 2015, pp. 1-10. [19]Y. Lecun and C. Cortes, “The MNIST Database of Handwritten Digits”, Internet: http://yann.lecun.com/exdb/mnist/, [Jul. 24, 2018]. [20]S. Han, H. Mao, W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” arXiv preprint arXiv:1510.00149, 2015. [21]M. Rastegari, Mohammad, V. Ordonez, J. Redmon, A. Farhadi, “Xnor-Net: Imagenet Classification Using Binary Convolutional Neural Networks,” In European Conference on Computer Vision, Springer, 2016, pp. 525-542. [22]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE Conf. on Comput. Vis. Pattern Recognit. (CVPR), 2016, pp 770-778. [23]I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activation,” arXiv preprint arXiv:1609.07061, 2016 [24] Y. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2016, pp. 1-43. [25]S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, W. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” in Proc IEEE/ACM Int. Symp. on Computer Archit., 2016, pp. 243-254. [26]J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. “Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing,” in Proc. 44th Annu. Int. Symp. Comput. Archit. (ISCA), June 2016, pp. 1-13. [27]D. Shin, J. Lee, J. Lee and H. J. Yoo, "14.2 DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks," in Proc. IEEE Int. Solid-State Circuits Conf., 2017, pp. 240-241. [28] J. Lee, C. Kim, S. Kang, D. Shin, S. Kim and H. J. Yoo, "UNPU: A 50.6TOPS/W Unified Deep Neural Network Accelerator with 1b-to-16b Fully-Variable Weight Bit-Precision," in Proc. IEEE Int. Solid-State Circuits Conf., 2018, pp. 218-220. [29]P. Lin, “Low-complexity Convolution Neural Network Training and Low Power Circuit Design of its Processing Element,” M.A. thesis, National Taiwan University, Taipei, 2017. [30]K.-H. Chen, C.-N. Chen, and T.-D. Chiueh, “Grouped Signed Power-of-Two Algorithms for Low-Complexity Adaptive Equalization,” IEEE Trans. Circuits Syst. II, Exp. Briefs, Vol. 52, No. 12, pp. 816-820, Dec. 2005. [31]Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proc. ACM Int. Conf. on Multimedia, 2014, pp. 675-678. [32] V. Nair and G. E. Hinton. “Rectified Linear Units Improve Restricted Boltzmann Machines,” in Proc. 27th Int. Conf. on Mach. Learn., 2010, pp. 807-814. [33]K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014. [34] “CIC ARC HS Based SoC Design Platform”, http://www.cic.org.tw/aisoc/aisoc.jsp, [Jul. 24, 2018]. [35] Z. Yuan, Y. Liu, J. Yue, J. Li, H. Yang, “CORAL: Coarse-grained Reconfigurable Architecture for Convolutional Neural Networks,” In Proc IEEE/ACM Int. Symp. on Low Power Electronics and Design (ISLPED), 2017, pp. 1-6. [36]G. Venkatesh, E. Nurvitadhi and D. Marr, “Accelerating Deep Convolutional Networks using low-precision and sparsity,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), New Orleans, LA, 2017, pp. 2861-2865. [37]A. Parashar et al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," in Proc. 44th Annu. Int. Symp. Comput. Archit. (ISCA), 2017, pp. 27-40. [38] S. Wang, D. Zhou, X. Han and T. Yoshimura, "Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks," in Proc. IEEE Des. Autom. Test Eur. (DATE), 2017, Lausanne, 2017, pp. 1032-1037. [39]Y. Shen, M. Ferdman and P. Milder, "Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer," In Proc. 25th IEEE Int. Symp. on Field-Programmable Custom Computing Machines (FCCM)., 2017, pp. 93-100. [40]C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie and X. Zhou, "DLAU: A Scalable Deep Learning Accelerator Unit on FPGA," in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol. 36, No. 3, pp. 513-517, March 2017. [41]Y. Umuroglu, et al., "Finn: A framework for Fast, Scalable Binarized Neural Network Inference." in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2017, pp. 65-74. [42]C. Zhang, Z. Fang, P. Zhou, P. Pan and J. Cong, "Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks," in Proc. IEEE/ACM Int. Conf. Comput.-Aided Des. (ICCAD), 2016, pp. 1-8. [43] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2015, pp. 161-170. [44]H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou and L. Wang, "A High Performance FPGA-based Accelerator for Large-scale Convolutional Neural Networks," In Proc. 26th Int. Conf. on Field Programmable Logic and Applications (FPL ’16) 2016, pp. 1-9. [45] P. Judd, A. Delmas, S. Sharify, A. Moshovos "Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Vomputing," arXiv preprint arXiv:1705.00125, 2017. [46]P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt and A. Moshovos, "Stripes: Bit-Serial Deep Neural Network Computing," in Proc. 49th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2016, pp. 1-12. [47] S. Venkataramani, et al. "Scaledeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks." in Proc. 44th Annu. Int. Symp. Comput. Archit., 2017, pp. 13–26. [48]W. Zhao, H. Fu, W. Luk, T. Yu, S. Wang, B. Feng, Y. Ma, G. Yang, “F-CNN: An FPGA-based Framework for Training Convolutional Neural Networks,” in Proc. IEEE Conf. on Application-specific Systems, Architectures and Processors, London, UK, July 2016, pp. 107-114. [49]X. Han, D. Zhou, S. Wang, S. Kimura, “CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks,” in Proc. Int. Conf. on Computer Design (ICCD), 2017, pp. 320-327.
|