|
[1] Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR. (2016) [2] W. Pan, A. Shams, M. A. Bayoumi, "NEDA: A new distributed arithmetic architecture and its application to discrete cosine transform", IEEE Workshop on SiPS, pp. 159-168, 1999. [3] Longa, P.; Miri, A. Area-efficient FIR filter design on FPGAs using distributed arithmetic. In Proceedings of the 6th IEEE International Symposium on Signal Processing and Information Technology, Vancouver, BC, Canada, 28–30 August 2006. [4] Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., et al.: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, arXiv preprint (2017). [5] Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE J. Emerging Sel. Topics Circuits Syst., vol. 9, no. 2, pp. 292–308, Jun. 2019. [6] Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, ‘‘Optimizing the convolution operation to accelerate deep neural networks on FPGA,’’ IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 7, pp. 1354–1367, Jul. 2018. [7] Alessandro Aimar, Hesham Mostafa, and others. 2017. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps. IEEE Transactions on Very Large Scale Integration Systems (2017). [8] A. Ardakani, C. Condo, M. Ahmadi, W. J. Gross, "An architecture to accelerate convolution in deep neural networks", IEEE Trans. Circuits Syst. I Reg. Papers, vol. 65, no. 4, pp. 1349-1362, Apr. 2018. [9] Y. Choi et al., "Energy-efficient design of processing element for convolutional neural network", IEEE Trans. Circuits Syst. II Exp. Briefs, vol. 64, no. 11, pp. 1332-1336, Nov. 2017. [10] Y. Wang, J. Lin, Z. Wang, "An energy-efficient architecture for binary weight convolutional neural networks", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 2, pp. 280-293, Feb. 2018. [11] Y.-H. Chen, T. Krishna, J. Emer, V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks", IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 127-138, Jan. 2017. [12] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2, 2015. [13] J. I. Guo, R. C. Ju, J. W. Chen, "An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization", IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, pp. 416-428, Apr. 2004. [14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS, 2012 [15] Y.-S. Jehng, "An efficient and simple VLSI tree architecture for motion estimation algorithms", IEEE Trans. Signal Processing, vol. 41, pp. 889-900, Feb. 1993. [16] R. Zhao, Y. Hu, J. Dotzel, C. De Sa, and Z. Zhang, “Improving neural [17] network quantization without retraining using outlier channel splitting,” [18] in Proc. Int. Conf. Mach. Learn. (ICML), 2019, pp. 7543–7552. [19] Η. T. Kung, "Why systolic architectures?", IEEE Computer, vol. 15, no. 1, pp. 37-46, 1982. [20] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, pp. 2295–2329, Dec 2017. [21] Girshick, R.: Fast R-CNN. In: ICCV (2015) [22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, "SSD: Single shot multibox detector", 2015. [23] R. Alvarez, R. Prabhavalkar, and A. Bakhtin, “On the efficient representation and execution of deep acoustic models,” in Proc. Interspeech, 2016. [24] A. Bhandare, V. Sripathi, D. Karkada, V. Menon, S. Choi, K. Datta, and V. Saletore. “Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model”. In: arXiv:1906.00532 (2019) [25] H. M. Jong, L. G. Chen, T. D. Chiueh, "Parallel architectures for 3-step hierarchical search block-matching algorithm", IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 407-416, Aug. 1994. [26] Y. Chen et al., “DaDianNao: A machine-learning supercomputer,”in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2014,pp. 609–622. [27] H. Kung, B. McDanel, S. Q. Zhang, "Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization", Proc. 24th Int. Conf. Archit. Support Program. Lang. Operating Syst., pp. 821-834, 2019. [28] A. G. Howard et al., MobileNets: Efficient convolutional neural networks for mobile vision applications, Apr. 2017. [29] D. Jung, W. Jung, B. Kim, S. Lee, W. Rhee, J. H. Ahn, Restructuring batch normalization to accelerate CNN training, July 2018. [30] Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottleneck. In: CVPR.
|