|
[1] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010. [2] A. Parashar et al., “SCNN: An accelerator for compressed-sparse convolutional neural networks,” ACM SIGARCH Computer Architecture News, vol. 45, no. 2, pp. 27-40, 2017. [3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [4] Y.-H. Chen et al., “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, 2016. [5] Y.-H. Chen et al., “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no.2, pp. 292-308, 2019. [6] J. F. Zhang et al., “SNAP: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference,” IEEE Journal of Solid-State Circuits, vol. 56, no. 2, pp. 636-647, 2020. [7] V. Sze et al., “"Overview of deep neural networks,” in Efficient Processing of Deep Neural Networks, Synthesis Lectures on Computer Architecture, Springer, Cham, 2020. [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems 25, 2012. [9] K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. [10] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. [11] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. [12] R. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. [13] W. Liu et al., “SSD: Single shot multibox detector,” in Proceedings of the 14th European Conference Computer Vision–ECCV 2016, Part I, 2016. [14] J. Redmon et al., “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. [15] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of the18th International Conference on Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Part III, 2015. [16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. [17] K. He et al., “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proceedings of the IEEE International Conference on Computer Vision, 2015. [18] Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367-379, June 2016. [19] A. Ardakani et al., “An architecture to accelerate convolution in deep neural networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 4, pp. 1349-1362, 2017. [20] S. Han et al., “Learning both weights and connections for efficient neural network,” Advances in Neural Information Processing Systems 28, 2015. [21] PyTorch. “Quantization — PyTorch 2.0 Documentation.” [22] B. Jacob et al., “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. [23] AMD Xilinx, “ZCU102 Evaluation Board User Guide (UG1182),” 21 Feb. 2022. [24] AMD Xilinx, “AXI DMA v7.1 LogiCORE IP Product Guide (PG021),” 27 Mar. 2022. [25] Y. Lu, Y.-L. Wu, and J.-D. Huang, “A coarse-grained dual-convolver based CNN accelerator with high computing resource utilization,” in Proceedings of the 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2020. [26] J.-S. Park et al., “A Multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4-nm flagship mobile SoC,” IEEE Journal of Solid-State Circuits, vol. 58, no. 1, pp. 189-202, 2022. [27] Arm, “AMBA AXI and ACE Protocol Specification,” 28 Oct. 2011.
|