跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.97) 您好!臺灣時間:2026/03/20 05:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊承祥
研究生(外文):Cheng-Hsiang Yang
論文名稱:加速深度學習之卷積神經網路的VLSI設計與實現
論文名稱(外文):VLSI Design and Implementation to Accelerate Deep Learning Convolution Neural Network
指導教授:賴永康
指導教授(外文):Yeong-Kang Lai
口試委員:溫志煜鍾育杰
口試委員(外文):Chih-Yu WenYU-JIE ZHONG
口試日期:2019-07-23
學位類別:碩士
校院名稱:國立中興大學
系所名稱:電機工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:83
中文關鍵詞:深度學習IC設計
外文關鍵詞:Deep LearningIC Design
相關次數:
  • 被引用被引用:0
  • 點閱點閱:855
  • 評分評分:
  • 下載下載:89
  • 收藏至我的研究室書目清單書目收藏:2
由於現今的卷積神經網路(Convolutional Neural Network,CNN)裡,超過90%的運算來自卷積(Convolution),故在卷積(Convolution)的硬體加速器的實現上其效率和性能,將影響整個CNN在inference時的執行速度。
在Convolution裡包含了四種迴圈的乘法累加運算,在實現硬體時會產生設計出的電路面積過大,使得成本過高。而在先前的技術中採用有限的循環優化技術,例如Loop unrolling,Loop Tiling和Loop interchange,僅在加速器架構和數據流的一些調整。而該技術通過重新配置架構來優化整個系統的效率,包括加速器晶片和DRAM,用於各種CNN的使用。 CNN在現代人工智慧系統中得到了廣泛的應用,但是也給底層硬體帶來了吞吐量和能源效率方面的挑戰。這是因為它的計算需要使用到大量的數據,而從On-chip和Off-chip創建重要的數據移動,這比計算更耗能。因此,將不同的CNN的數據移動成本最小化是高吞吐量和高能效的關鍵。
而本文將研究如何優化卷積運算的數據移動,將實現出可以有效地的硬體加速方案,並有效地管理數據流(Data Flow),因為若沒有在硬體設計前完成對Convolution數據流的優化,最終的硬體加速器會很難有效的利用數據並管理移動據,而所提出的CNN加速方案和架構將通過實施LeNet來進行的End to End 的推斷來進行演示,同時最佳化Data Reuse以此來提高性能。對於此架構CNN的整體吞吐量分別在7.4 GOPS。
Since today's Convolutional Neural Network (CNN), more than 90% of operations come from Convolution, its efficiency and performance in the implementation of Convolution's hardware accelerators will affect execution speed of the entire CNN in inference.
In Convolution, four loop multiply-accumulate operations are included. When the hardware is implemented, the designed circuit area is too large, which makes the cost too high. In the Previous Works, limited loop optimization techniques such as Loop unrolling, Loop Tiling and Loop Interchange were used, only some adjustments were made to the accelerator architecture and data flow. The technology optimizes the efficiency of the entire system by reconfiguring the architecture, including accelerator chips and DRAM, for use with various CNNs. CNN has been widely used in modern artificial intelligence systems, but it also brings challenges in throughput and energy efficiency to the underlying hardware. This is because its calculations require the use of large amounts of data, while creating important data movements from On-chip and Off-chip, which is more energy intensive than calculations. Therefore, minimizing the data movement costs of different CNNs is the key to high throughput and energy efficiency.
This work will suggest how to optimize the data movement of the convolution operation, implement an efficient hardware acceleration scheme, and effectively manage the data flow (Data Flow), because if the Convolution data stream is not optimized before the hardware design, the final hardware accelerator will be difficult. Effectively use data and manage mobile data, and demonstrate the proposed CNN acceleration scheme and architecture by implementing LeNet end-to-end reasoning while optimizing data reuse to improve performance. The total CNN throughput of this architecture is 7.4 GOPS.
目錄
第一章 引言 1
一、深度學習 2
二、CNN人工神經網路 5
三、硬體加速 7
(一) ACCELERATION OF CONVOLUTIONAL LOOPS 8
四、論文組織 18
第二章 基於CNN神經網路的硬體加速和文獻 19
一、CNN硬體加速器相關文獻介紹 20
(一) Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks 20
(二) Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA 22
(三) Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks 25
(四) An Architecture to Accelerate Convolution in Deep Neural Networks 27
(五) A Fast FPGA-based Deep Convolutional Neural Network Using Pseudo Parallel Memories 28
第三章 應用於CNN之Lenet神經網路模擬結果 29
一、前言 29
二、CNN神經網路之Lenet網路架構 30
三、模擬結果 34
第四章 硬體架構設計與實作 41
一、前言 41
二、硬體規格 41
三、硬體架構設計 43
四、各單元之硬體架構設計 47
(一) Data bus from buffer to PE (Data2PE) 47
(二) Convolution PE Architectrue 53
(三) Max Pooling 58
(四) Fully Connected 62
(五) Parallel Convolution PEs 64
五、實作結果 65
(一) 數位IC之設計流程 65
(二) 晶片規格 66
(三) SYNTHESIS 69
(四) Layout 72
第五章 結論 73
文獻參考 74
文獻參考
[1]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document" PROC. of the IEEE, november 1998。
[2]C Zhang, P Li, G Sun, Y Guan, B Xiao, J Cong. "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks" Proceedings of the 2015 ACM/SIGDA International Symposium on Field。
[3]D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj, C. Alfredo, M. Berin, and C. Eugenio. "Accelerating deep neural networks on mobile processor with embedded programmable logic. " In NIPS 2013. IEEE, 2013.。
[4]S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273–284. ACM, 2010.
[5]S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, volume 38, pages 247–257. ACM, 2010.
[6]T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGPLAN Not., 49(4):269–284, Feb. 2014.
[7]J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In Artificial Neural Networks and Machine Learning–ICANN 2014, pages 281–290. Springer, 2014.
[8]C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32–37. IEEE, 2009.
[9]Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/ 2013/06/improving-photo-search-step-across.html.
[10]S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221–231, Jan. 2013
[11]A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
[12]H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pages 473–480, New York, NY, USA, 2007. ACM.
[13]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[14]M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13–19. IEEE, 2013.
[15]L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 29–38, New York, NY, USA, 2013. ACM.
[16]M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In Application-specific Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEE International Conference on, pages 53–60. IEEE, 2009.
[17]S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76, Apr. 2009.
[18]W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 9–18, New York, NY, USA, 2013. ACM.
[19]Yufei Ma , Student Member, Yu Cao,Fellow , Sarma Vrudhula , Fellow, and Jae-sun Seo, Senior Member, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA ", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 7, JULY 2018.
[20]O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[21]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2012, pp. 1097–1105.
[22]M. Lin, Q. Chen, and S. Yan. (Mar. 2014). “Network in network.” [Online]. Available: https://arxiv.org/abs/1312.4400
[23]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
[24]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[25]D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high-performance computing,” ACM Comput. Surv., vol. 26, no. 4, pp. 345–420, Dec. 1994.
[26]Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 127–138, Jan. 2017.
[27]Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. ACM/IEEE Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 367–379.
[28]C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2015, pp. 161–170.
[29]C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong, “Energy-efficient CNN implementation on a deeply pipelined FPGA cluster,” in Proc. ACM Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2016, pp. 326–331.
[30]N. Suda et al., “Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2016, pp. 16–25.
[31]U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. R. Chiu, “An OpenCL deep learning accelerator on Arria 10,” in Proc. ACM/SIGDA Symp. Field-Program. Gate Arrays (FPGA), Feb. 2017, pp. 55–64.
[32]K. Guo et al., “Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA,”IEEETrans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 1, pp. 35–47, Jan. 2018.
[33]Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, “Scalable and modularized RTL compilation of convolutional neural networks onto FPGA,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug./Sep. 2016, pp. 1–8.
[34]Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2017, pp. 45–54.
[35]Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Sep. 2017, pp. 1–8.
[36]H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, and L. Wang, “A high performance FPGA-based accelerator for large-scale convolutional neural networks,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug. 2016, pp. 1–9.
[37]A. Rahman, J. Lee, and K. Choi, “Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array,” in Proc. IEEE Design, Auto. Test Eur. Conf. (DATE), Mar. 2016, pp. 1393–1398.
[38]M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, “Design space exploration of FPGA-based deep convolutional neural networks,” in Proc. IEEE Asia South Pacific Design Auto. Conf. (ASP-DAC), Jan. 2016, pp. 575–580.
[39]S. Han et al., “EIE: Efficient inference engine on compressed deep neural network,” in Proc. ACM/IEEE Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 243–254.
[40]L. Du et al., “A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 1, pp. 198–208, Jan. 2018.
[41]B. Bosi, G. Bois, and Y. Savaria, “Reconfigurable pipelined 2-D convolvers for fast digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 3, pp. 299–308, Sep. 1999.
[42]Y. Guan et al., “FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates,” in Proc. IEEE Int. Symp. Field-Program. Custom Comput. Mach. (FCCM), Apr./May 2017, pp. 152–159.
[43]X. Wei et al., “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs,” in Proc. ACM the 54th Annu. Design Autom. Conf. (DAC), Jun. 2017, pp. 1–6.
[44]Yu-Hsin Chen, Student Member, Tushar Krishna, Member, Joel S. Emer, Fellow, and Vivienne Sze, Senior Member, “Eyeriss: An Energy-Efficient Reconfigurable accelerator for Deep Convolutional Neural Networks” ,IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 52, NO. 1, JANUARY 2017.
[45]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.
[46]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 25. 2012, pp. 1097–1105.
[47]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, pp. 1–14, Sep. 2014.
[48]C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9.
[49]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016.
[50]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 580–587.
[51]P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated recognition, localization and detection using convolutional networks,” CoRR, vol. abs/1312.6229, pp. 1–16, Dec. 2013.
[52]M. Bojarski et al. (2016). “End to end learning for self-driving cars.” [Online]. Available: https://arxiv.org/abs/1604.07316.
[53]D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.
[54]R. Hameed et al., “Understanding sources of inefficiency in generalpurpose chips,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 37–47.
[55]M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, pp. 10–14.
[56]M. Sankaradas et al., “A massively parallel coprocessor for convolutional neural networks,” in Proc. 20th IEEE Int. Conf. Appl.-Specific Syst., Archit. Process., Jul. 2009, pp. 53–60.
[57]V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, “Towards an embedded biologically-inspired machine vision processor,” in Proc. Int. Conf. Field-Program. Technol. (FPT), Dec. 2010, pp. 273–278.
[58]S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A dynamically configurable coprocessor for convolutional neural networks,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 247–257.
[59]M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memorycentric accelerator design for convolutional neural networks,” in Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 13–19.
[60]V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2014, pp. 696–701.
[61]T. Chen et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2014, pp. 269–284.
[62]Z. Du et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in Proc. 42nd Annu. Int. Symp. Comput. Archit., 2015, pp. 92–104.
[63]Y. Chen et al., “DaDianNao: A machine-learning supercomputer,” in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2014, pp. 609–622.
[64]S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” CoRR, vol. abs/1502.02551, pp. 1–10, Feb. 2015.
[65]C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2015, pp. 161–170.
[66]F. Conti and L. Benini, “A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters,” in Proc. Design, Autom. Test Eur. Conf. Exhibit., 2015, pp. 683–688.
[67]S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.-J. Yoo, “A 1.93TOPS/W scalable deep learning/inference processor with tetraparallel MIMD architecture for big-data applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2015, pp. 1–3.
[68]L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolutional network accelerator,” in Proc. 25th Ed. Great Lakes Symp. VLSI, 2015, pp. 199–204.
[69]J. Sim, J.-S. Park, M. Kim, D. Bae, Y. Choi, and L.-S. Kim, “A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Jan./Feb. 2016, pp. 264–265.
[70]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[71]O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
[72]Y. Jia et al. (2014). “Caffe: Convolutional architecture for fast feature embedding.” [Online]. Available: https://arxiv.org/abs/1408.5093.
[73]Y.-H. Chen. An 1000-Class Image Classification Task Performed by the Eyeriss-Integrated Deep Learning System, accessed on 2016. [Online]. Available: https://vimeo.com/154012013.
[74]Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May/Jun. 2010, pp. 253–256.
[75]V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814.
[76]Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in Proc. 43rd Annu. Int. Symp. Comput. Archit. (ISCA), 2016, pp. 367–379.
[77]S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Proc. Adv. Neural Inf. Process. Syst., vol. 28. 2015, pp. 1135–1143.
[78]J. Howard et al., “A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2010, pp. 108–109.
[79]B. K. Daya et al., “SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering,” in Proc. 41st Annu. Int. Symp. Comput. Archit. (ISCA), Jun. 2014, pp. 25–36.
[80]Arash Ardakani, Carlo Condo, Mehdi Ahmadi, Warren J. Gross, “An Architecture to Accelerate Convolution in Deep Neural Networks” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 65, NO. 4, APRIL 2018.
[81]Muluken Hailesellasie, Syed Rafay Hasan† mthailesel4, “A Fast FPGA-based Deep Convolutional Neural Network Using Pseudo Parallel Memories”, IEEE Xplore: 28 September 2017.
[82]S. Wang, D. Zhou, X. Han, and T. Yoshimura. (2017). “Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks.” [Online]. Available: https://arxiv.org/abs/1703.01457
[83]D. Shin, J. Lee, J. Lee, and H.-J. Yoo, “14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks, ” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 240-241.
[84]B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracyfrequency-scalable convolutional neural network processor in 28 nm FDSOI,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 246–247.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top