|
文獻參考 [1]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document" PROC. of the IEEE, november 1998。 [2]C Zhang, P Li, G Sun, Y Guan, B Xiao, J Cong. "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks" Proceedings of the 2015 ACM/SIGDA International Symposium on Field。 [3]D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj, C. Alfredo, M. Berin, and C. Eugenio. "Accelerating deep neural networks on mobile processor with embedded programmable logic. " In NIPS 2013. IEEE, 2013.。 [4]S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273–284. ACM, 2010. [5]S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, volume 38, pages 247–257. ACM, 2010. [6]T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGPLAN Not., 49(4):269–284, Feb. 2014. [7]J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In Artificial Neural Networks and Machine Learning–ICANN 2014, pages 281–290. Springer, 2014. [8]C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32–37. IEEE, 2009. [9]Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/ 2013/06/improving-photo-search-step-across.html. [10]S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221–231, Jan. 2013 [11]A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012. [12]H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pages 473–480, New York, NY, USA, 2007. ACM. [13]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [14]M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13–19. IEEE, 2013. [15]L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 29–38, New York, NY, USA, 2013. ACM. [16]M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In Application-specific Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEE International Conference on, pages 53–60. IEEE, 2009. [17]S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76, Apr. 2009. [18]W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 9–18, New York, NY, USA, 2013. ACM. [19]Yufei Ma , Student Member, Yu Cao,Fellow , Sarma Vrudhula , Fellow, and Jae-sun Seo, Senior Member, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA ", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 7, JULY 2018. [20]O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015. [21]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2012, pp. 1097–1105. [22]M. Lin, Q. Chen, and S. Yan. (Mar. 2014). “Network in network.” [Online]. Available: https://arxiv.org/abs/1312.4400 [23]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015. [24]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778. [25]D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high-performance computing,” ACM Comput. Surv., vol. 26, no. 4, pp. 345–420, Dec. 1994. [26]Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 127–138, Jan. 2017. [27]Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. ACM/IEEE Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 367–379. [28]C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2015, pp. 161–170. [29]C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong, “Energy-efficient CNN implementation on a deeply pipelined FPGA cluster,” in Proc. ACM Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2016, pp. 326–331. [30]N. Suda et al., “Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2016, pp. 16–25. [31]U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. R. Chiu, “An OpenCL deep learning accelerator on Arria 10,” in Proc. ACM/SIGDA Symp. Field-Program. Gate Arrays (FPGA), Feb. 2017, pp. 55–64. [32]K. Guo et al., “Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA,”IEEETrans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 1, pp. 35–47, Jan. 2018. [33]Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, “Scalable and modularized RTL compilation of convolutional neural networks onto FPGA,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug./Sep. 2016, pp. 1–8. [34]Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2017, pp. 45–54. [35]Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Sep. 2017, pp. 1–8. [36]H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, and L. Wang, “A high performance FPGA-based accelerator for large-scale convolutional neural networks,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug. 2016, pp. 1–9. [37]A. Rahman, J. Lee, and K. Choi, “Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array,” in Proc. IEEE Design, Auto. Test Eur. Conf. (DATE), Mar. 2016, pp. 1393–1398. [38]M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, “Design space exploration of FPGA-based deep convolutional neural networks,” in Proc. IEEE Asia South Pacific Design Auto. Conf. (ASP-DAC), Jan. 2016, pp. 575–580. [39]S. Han et al., “EIE: Efficient inference engine on compressed deep neural network,” in Proc. ACM/IEEE Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 243–254. [40]L. Du et al., “A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 1, pp. 198–208, Jan. 2018. [41]B. Bosi, G. Bois, and Y. Savaria, “Reconfigurable pipelined 2-D convolvers for fast digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 3, pp. 299–308, Sep. 1999. [42]Y. Guan et al., “FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates,” in Proc. IEEE Int. Symp. Field-Program. Custom Comput. Mach. (FCCM), Apr./May 2017, pp. 152–159. [43]X. Wei et al., “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs,” in Proc. ACM the 54th Annu. Design Autom. Conf. (DAC), Jun. 2017, pp. 1–6. [44]Yu-Hsin Chen, Student Member, Tushar Krishna, Member, Joel S. Emer, Fellow, and Vivienne Sze, Senior Member, “Eyeriss: An Energy-Efficient Reconfigurable accelerator for Deep Convolutional Neural Networks” ,IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 52, NO. 1, JANUARY 2017. [45]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015. [46]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 25. 2012, pp. 1097–1105. [47]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, pp. 1–14, Sep. 2014. [48]C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9. [49]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016. [50]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 580–587. [51]P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated recognition, localization and detection using convolutional networks,” CoRR, vol. abs/1312.6229, pp. 1–16, Dec. 2013. [52]M. Bojarski et al. (2016). “End to end learning for self-driving cars.” [Online]. Available: https://arxiv.org/abs/1604.07316. [53]D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016. [54]R. Hameed et al., “Understanding sources of inefficiency in generalpurpose chips,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 37–47. [55]M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, pp. 10–14. [56]M. Sankaradas et al., “A massively parallel coprocessor for convolutional neural networks,” in Proc. 20th IEEE Int. Conf. Appl.-Specific Syst., Archit. Process., Jul. 2009, pp. 53–60. [57]V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, “Towards an embedded biologically-inspired machine vision processor,” in Proc. Int. Conf. Field-Program. Technol. (FPT), Dec. 2010, pp. 273–278. [58]S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A dynamically configurable coprocessor for convolutional neural networks,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 247–257. [59]M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memorycentric accelerator design for convolutional neural networks,” in Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 13–19. [60]V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2014, pp. 696–701. [61]T. Chen et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2014, pp. 269–284. [62]Z. Du et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in Proc. 42nd Annu. Int. Symp. Comput. Archit., 2015, pp. 92–104. [63]Y. Chen et al., “DaDianNao: A machine-learning supercomputer,” in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2014, pp. 609–622. [64]S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” CoRR, vol. abs/1502.02551, pp. 1–10, Feb. 2015. [65]C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2015, pp. 161–170. [66]F. Conti and L. Benini, “A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters,” in Proc. Design, Autom. Test Eur. Conf. Exhibit., 2015, pp. 683–688. [67]S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.-J. Yoo, “A 1.93TOPS/W scalable deep learning/inference processor with tetraparallel MIMD architecture for big-data applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2015, pp. 1–3. [68]L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolutional network accelerator,” in Proc. 25th Ed. Great Lakes Symp. VLSI, 2015, pp. 199–204. [69]J. Sim, J.-S. Park, M. Kim, D. Bae, Y. Choi, and L.-S. Kim, “A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Jan./Feb. 2016, pp. 264–265. [70]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. [71]O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015. [72]Y. Jia et al. (2014). “Caffe: Convolutional architecture for fast feature embedding.” [Online]. Available: https://arxiv.org/abs/1408.5093. [73]Y.-H. Chen. An 1000-Class Image Classification Task Performed by the Eyeriss-Integrated Deep Learning System, accessed on 2016. [Online]. Available: https://vimeo.com/154012013. [74]Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May/Jun. 2010, pp. 253–256. [75]V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814. [76]Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. 43rd Annu. Int. Symp. Comput. Archit. (ISCA), 2016, pp. 367–379. [77]S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Proc. Adv. Neural Inf. Process. Syst., vol. 28. 2015, pp. 1135–1143. [78]J. Howard et al., “A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2010, pp. 108–109. [79]B. K. Daya et al., “SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering,” in Proc. 41st Annu. Int. Symp. Comput. Archit. (ISCA), Jun. 2014, pp. 25–36. [80]Arash Ardakani, Carlo Condo, Mehdi Ahmadi, Warren J. Gross, “An Architecture to Accelerate Convolution in Deep Neural Networks” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 65, NO. 4, APRIL 2018. [81]Muluken Hailesellasie, Syed Rafay Hasan† mthailesel4, “A Fast FPGA-based Deep Convolutional Neural Network Using Pseudo Parallel Memories”, IEEE Xplore: 28 September 2017. [82]S. Wang, D. Zhou, X. Han, and T. Yoshimura. (2017). “Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks.” [Online]. Available: https://arxiv.org/abs/1703.01457 [83]D. Shin, J. Lee, J. Lee, and H.-J. Yoo, “14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks, ” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 240-241. [84]B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracyfrequency-scalable convolutional neural network processor in 28 nm FDSOI,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 246–247.
|