|
[1]KwonHyoukjun, SamajdarAnanda, and KrishnaTushar, “MAERI,” ACM SIGPLAN Notices, vol. 53, no. 2, pp. 461–475, Mar. 2018, doi: 10.1145/3296957.3173176. [2]ChenTianshi et al., “DianNao,” ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, Feb. 2014, doi: 10.1145/2654822.2541967. [3]ChenYu-Hsin, EmerJoel, and SzeVivienne, “Eyeriss,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367–379, Jun. 2016, doi: 10.1145/3007787.3001177. [4]Y. H. Chen, T. J. Yang, J. S. Emer, and V. Sze, “Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices,” IEEE J Emerg Sel Top Circuits Syst, vol. 9, no. 2, pp. 292–308, Jun. 2019, doi: 10.1109/JETCAS.2019.2910232. [5]Z. Du et al., “ShiDianNao: Shifting vision processing closer to the sensor,” Proc Int Symp Comput Archit, vol. 13-17-June-2015, pp. 92–104, Jun. 2015, doi: 10.1145/2749469.2750389. [6]N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” Proc Int Symp Comput Archit, vol. Part F128643, pp. 1–12, Jun. 2017, doi: 10.1145/3079856.3080246. [7]H. Kwon, P. Chatarasi, V. Sarkar, T. Krishna, M. Pellauer, and A. Parashar, “MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings,” IEEE Micro, vol. 40, no. 3, pp. 20–29, May 2020, doi: 10.1109/MM.2020.2985963. [8]H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, “Understanding reuse, performance, and hardware cost of DNN dataflows: A data-centric approach,” Proceedings of the Annual International Symposium on Microarchitecture, MICRO, pp. 754–768, Oct. 2019, doi: 10.1145/3352460.3358252. [9]A. Parashar et al., “Timeloop: A Systematic Approach to DNN Accelerator Evaluation,” Proceedings - 2019 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2019, pp. 304–315, Apr. 2019, doi: 10.1109/ISPASS.2019.00042. [10]X. Yang et al., “Interstellar: Using halide’s scheduling language to analyze DNN accelerators,” International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, pp. 369–383, Mar. 2020, doi: 10.1145/3373376.3378514. [11]S. Dave, Y. Kim, S. Avancha, K. Lee, and A. Shrivastava, “DMazerunner: Executing perfectly nested loops on dataflow accelerators,” ACM Transactions on Embedded Computing Systems, vol. 18, no. 5s, Oct. 2019, doi: 10.1145/3358198. [12]L. Lu et al., “TENET: A framework for modeling tensor dataflow based on relation-centric notation,” Proc Int Symp Comput Archit, vol. 2021-June, pp. 720–733, Jun. 2021, doi: 10.1109/ISCA52012.2021.00062. [13]T. Jin and S. Hong, “Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization,” International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, pp. 835–847, Apr. 2019, doi: 10.1145/3297858.3304038. [14]S. C. Kao, G. Jeong, and T. Krishna, “Confuciux: Autonomous hardware resource assignment for DNN accelerators using reinforcement learning,” Proceedings of the Annual International Symposium on Microarchitecture, MICRO, vol. 2020-October, pp. 622–636, Oct. 2020, doi: 10.1109/MICRO50266.2020.00058. [15]P. Chatarasi, H. Kwon, A. Parashar, M. Pellauer, T. Krishna, and V. Sarkar, “Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 19, no. 1, Dec. 2021, doi: 10.1145/3485137. [16]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” pp. 770–778, 2016. Accessed: Aug. 04, 2023. [Online]. Available: http://image-net.org/challenges/LSVRC/2015/ [17]K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2014, Accessed: Aug. 04, 2023. [Online]. Available: https://arxiv.org/abs/1409.1556v6 [18]Gem5 [Online]. Available https://www.gem5.org/ [19]“Nvidia deep learning accelerator (nvdla),” 2018, http://nvdla.org [20]Onnx/onnx: Open standard for machine learning interoperability, GitHub. Available at: https://github.com/onnx/onnx (Accessed: 04 August 2023) [21]Wei-Chun Huang, “Design Space Exploration for Scalable DNN Accelerators Using a Memory-Centric Analytical Model for HW/SW Co-Design”, Aug. 2023
|