臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.97) 您好！臺灣時間：2026/03/20 05:23

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
論文連結
QR Code

本論文永久網址:

研究生:

楊承祥

研究生(外文):

Cheng-Hsiang Yang

論文名稱:

加速深度學習之卷積神經網路的VLSI設計與實現

論文名稱(外文):

VLSI Design and Implementation to Accelerate Deep Learning Convolution Neural Network

指導教授:

賴永康

指導教授(外文):

Yeong-Kang Lai

口試委員:

溫志煜、鍾育杰

口試委員(外文):

Chih-Yu Wen、YU-JIE ZHONG

口試日期:

2019-07-23

學位類別:

碩士

校院名稱:

國立中興大學

系所名稱:

電機工程學系所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2019

畢業學年度:

107

語文別:

中文

論文頁數:

中文關鍵詞:

深度學習、IC設計

外文關鍵詞:

Deep Learning、IC Design

相關次數:

被引用:0
點閱:855
評分:
下載:89
書目收藏:2

由於現今的卷積神經網路(Convolutional Neural Network,CNN)裡，超過90%的運算來自卷積(Convolution)，故在卷積(Convolution)的硬體加速器的實現上其效率和性能，將影響整個CNN在inference時的執行速度。
在Convolution裡包含了四種迴圈的乘法累加運算，在實現硬體時會產生設計出的電路面積過大，使得成本過高。而在先前的技術中採用有限的循環優化技術，例如Loop unrolling，Loop Tiling和Loop interchange，僅在加速器架構和數據流的一些調整。而該技術通過重新配置架構來優化整個系統的效率，包括加速器晶片和DRAM，用於各種CNN的使用。 CNN在現代人工智慧系統中得到了廣泛的應用，但是也給底層硬體帶來了吞吐量和能源效率方面的挑戰。這是因為它的計算需要使用到大量的數據，而從On-chip和Off-chip創建重要的數據移動，這比計算更耗能。因此，將不同的CNN的數據移動成本最小化是高吞吐量和高能效的關鍵。
而本文將研究如何優化卷積運算的數據移動，將實現出可以有效地的硬體加速方案，並有效地管理數據流(Data Flow)，因為若沒有在硬體設計前完成對Convolution數據流的優化，最終的硬體加速器會很難有效的利用數據並管理移動據，而所提出的CNN加速方案和架構將通過實施LeNet來進行的End to End 的推斷來進行演示，同時最佳化Data Reuse以此來提高性能。對於此架構CNN的整體吞吐量分別在7.4 GOPS。

Since today's Convolutional Neural Network (CNN), more than 90% of operations come from Convolution, its efficiency and performance in the implementation of Convolution's hardware accelerators will affect execution speed of the entire CNN in inference.
In Convolution, four loop multiply-accumulate operations are included. When the hardware is implemented, the designed circuit area is too large, which makes the cost too high. In the Previous Works, limited loop optimization techniques such as Loop unrolling, Loop Tiling and Loop Interchange were used, only some adjustments were made to the accelerator architecture and data flow. The technology optimizes the efficiency of the entire system by reconfiguring the architecture, including accelerator chips and DRAM, for use with various CNNs. CNN has been widely used in modern artificial intelligence systems, but it also brings challenges in throughput and energy efficiency to the underlying hardware. This is because its calculations require the use of large amounts of data, while creating important data movements from On-chip and Off-chip, which is more energy intensive than calculations. Therefore, minimizing the data movement costs of different CNNs is the key to high throughput and energy efficiency.
This work will suggest how to optimize the data movement of the convolution operation, implement an efficient hardware acceleration scheme, and effectively manage the data flow (Data Flow), because if the Convolution data stream is not optimized before the hardware design, the final hardware accelerator will be difficult. Effectively use data and manage mobile data, and demonstrate the proposed CNN acceleration scheme and architecture by implementing LeNet end-to-end reasoning while optimizing data reuse to improve performance. The total CNN throughput of this architecture is 7.4 GOPS.

目錄
第一章引言 1
一、深度學習 2
二、CNN人工神經網路 5
三、硬體加速 7
(一) ACCELERATION OF CONVOLUTIONAL LOOPS 8
四、論文組織 18
第二章基於CNN神經網路的硬體加速和文獻 19
一、CNN硬體加速器相關文獻介紹 20
(一) Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks 20
(二) Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA 22
(三) Eyeriss: An Energy-Efﬁcient Reconﬁgurable Accelerator for Deep Convolutional Neural Networks 25
(四) An Architecture to Accelerate Convolution in Deep Neural Networks 27
(五) A Fast FPGA-based Deep Convolutional Neural Network Using Pseudo Parallel Memories 28
第三章應用於CNN之Lenet神經網路模擬結果 29
一、前言 29
二、CNN神經網路之Lenet網路架構 30
三、模擬結果 34
第四章硬體架構設計與實作 41
一、前言 41
二、硬體規格 41
三、硬體架構設計 43
四、各單元之硬體架構設計 47
(一) Data bus from buffer to PE (Data2PE) 47
(二) Convolution PE Architectrue 53
(三) Max Pooling 58
(四) Fully Connected 62
(五) Parallel Convolution PEs 64
五、實作結果 65
(一) 數位IC之設計流程 65
(二) 晶片規格 66
(三) SYNTHESIS 69
(四) Layout 72
第五章結論 73
文獻參考 74

文獻參考
[1]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document" PROC. of the IEEE, november 1998。
[2]C Zhang, P Li, G Sun, Y Guan, B Xiao, J Cong. "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks" Proceedings of the 2015 ACM/SIGDA International Symposium on Field。
[3]D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj, C. Alfredo, M. Berin, and C. Eugenio. "Accelerating deep neural networks on mobile processor with embedded programmable logic. " In NIPS 2013. IEEE, 2013.。
[4]S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classiﬁcation. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273–284. ACM, 2010.
[5]S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically conﬁgurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, volume 38, pages 247–257. ACM, 2010.
[6]T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGPLAN Not., 49(4):269–284, Feb. 2014.
[7]J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In Artiﬁcial Neural Networks and Machine Learning–ICANN 2014, pages 281–290. Springer, 2014.
[8]C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32–37. IEEE, 2009.
[9]Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/ 2013/06/improving-photo-search-step-across.html.
[10]S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221–231, Jan. 2013
[11]A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.
[12]H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pages 473–480, New York, NY, USA, 2007. ACM.
[13]Y. LeCun, L. Bottou, Y. Bengio, and P. Haﬀner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[14]M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13–19. IEEE, 2013.
[15]L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for conﬁgurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 29–38, New York, NY, USA, 2013. ACM.
[16]M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In Application-speciﬁc Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEE International Conference on, pages 53–60. IEEE, 2009.
[17]S. Williams, A. Waterman, and D. Patterson. Rooﬂine: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76, Apr. 2009.
[18]W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 9–18, New York, NY, USA, 2013. ACM.
[19]Yufei Ma , Student Member, Yu Cao,Fellow , Sarma Vrudhula , Fellow, and Jae-sun Seo, Senior Member, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA ", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 7, JULY 2018.
[20]O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[21]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classiﬁcation with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2012, pp. 1097–1105.
[22]M. Lin, Q. Chen, and S. Yan. (Mar. 2014). “Network in network.” [Online]. Available: https://arxiv.org/abs/1312.4400
[23]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
[24]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[25]D. F. Bacon, S. L. Graham, and O. J. Sharp, “Compiler transformations for high-performance computing,” ACM Comput. Surv., vol. 26, no. 4, pp. 345–420, Dec. 1994.
[26]Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energyefﬁcient reconﬁgurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 127–138, Jan. 2017.
[27]Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efﬁcient dataﬂow for convolutional neural networks,” in Proc. ACM/IEEE Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 367–379.
[28]C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2015, pp. 161–170.
[29]C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong, “Energy-efﬁcient CNN implementation on a deeply pipelined FPGA cluster,” in Proc. ACM Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2016, pp. 326–331.
[30]N. Suda et al., “Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2016, pp. 16–25.
[31]U. Aydonat, S. O’Connell, D. Capalija, A. C. Ling, and G. R. Chiu, “An OpenCL deep learning accelerator on Arria 10,” in Proc. ACM/SIGDA Symp. Field-Program. Gate Arrays (FPGA), Feb. 2017, pp. 55–64.
[32]K. Guo et al., “Angel-Eye: A complete design ﬂow for mapping CNN onto embedded FPGA,”IEEETrans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 1, pp. 35–47, Jan. 2018.
[33]Y. Ma, N. Suda, Y. Cao, J.-S. Seo, and S. Vrudhula, “Scalable and modularized RTL compilation of convolutional neural networks onto FPGA,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug./Sep. 2016, pp. 1–8.
[34]Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “Optimizing loop operation and dataﬂow in FPGA acceleration of deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA), Feb. 2017, pp. 45–54.
[35]Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, “An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Sep. 2017, pp. 1–8.
[36]H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, and L. Wang, “A high performance FPGA-based accelerator for large-scale convolutional neural networks,” in Proc. IEEE Int. Conf. Field-Program. Logic Appl. (FPL), Aug. 2016, pp. 1–9.
[37]A. Rahman, J. Lee, and K. Choi, “Efﬁcient FPGA acceleration of convolutional neural networks using logical-3D compute array,” in Proc. IEEE Design, Auto. Test Eur. Conf. (DATE), Mar. 2016, pp. 1393–1398.
[38]M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, “Design space exploration of FPGA-based deep convolutional neural networks,” in Proc. IEEE Asia South Paciﬁc Design Auto. Conf. (ASP-DAC), Jan. 2016, pp. 575–580.
[39]S. Han et al., “EIE: Efﬁcient inference engine on compressed deep neural network,” in Proc. ACM/IEEE Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 243–254.
[40]L. Du et al., “A reconﬁgurable streaming deep convolutional neural network accelerator for Internet of Things,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 1, pp. 198–208, Jan. 2018.
[41]B. Bosi, G. Bois, and Y. Savaria, “Reconﬁgurable pipelined 2-D convolvers for fast digital signal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 3, pp. 299–308, Sep. 1999.
[42]Y. Guan et al., “FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates,” in Proc. IEEE Int. Symp. Field-Program. Custom Comput. Mach. (FCCM), Apr./May 2017, pp. 152–159.
[43]X. Wei et al., “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs,” in Proc. ACM the 54th Annu. Design Autom. Conf. (DAC), Jun. 2017, pp. 1–6.
[44]Yu-Hsin Chen, Student Member, Tushar Krishna, Member, Joel S. Emer, Fellow, and Vivienne Sze, Senior Member, “Eyeriss: An Energy-Efﬁcient Reconﬁgurable accelerator for Deep Convolutional Neural Networks” ,IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 52, NO. 1, JANUARY 2017.
[45]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.
[46]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classiﬁcation with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., vol. 25. 2012, pp. 1097–1105.
[47]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, pp. 1–14, Sep. 2014.
[48]C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 1–9.
[49]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016.
[50]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 580–587.
[51]P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated recognition, localization and detection using convolutional networks,” CoRR, vol. abs/1312.6229, pp. 1–16, Dec. 2013.
[52]M. Bojarski et al. (2016). “End to end learning for self-driving cars.” [Online]. Available: https://arxiv.org/abs/1604.07316.
[53]D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.
[54]R. Hameed et al., “Understanding sources of inefﬁciency in generalpurpose chips,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 37–47.
[55]M. Horowitz, “Computing’s energy problem (and what we can do about it),” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2014, pp. 10–14.
[56]M. Sankaradas et al., “A massively parallel coprocessor for convolutional neural networks,” in Proc. 20th IEEE Int. Conf. Appl.-Speciﬁc Syst., Archit. Process., Jul. 2009, pp. 53–60.
[57]V. Sriram, D. Cox, K. H. Tsoi, and W. Luk, “Towards an embedded biologically-inspired machine vision processor,” in Proc. Int. Conf. Field-Program. Technol. (FPT), Dec. 2010, pp. 273–278.
[58]S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, “A dynamically conﬁgurable coprocessor for convolutional neural networks,” in Proc. 37th Annu. Int. Symp. Comput. Archit., 2010, pp. 247–257.
[59]M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memorycentric accelerator design for convolutional neural networks,” in Proc. IEEE 31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 13–19.
[60]V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, “A 240 G-ops/s mobile coprocessor for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2014, pp. 696–701.
[61]T. Chen et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. 19th Int. Conf. Archit. Support Program. Lang. Oper. Syst., 2014, pp. 269–284.
[62]Z. Du et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in Proc. 42nd Annu. Int. Symp. Comput. Archit., 2015, pp. 92–104.
[63]Y. Chen et al., “DaDianNao: A machine-learning supercomputer,” in Proc. 47th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2014, pp. 609–622.
[64]S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” CoRR, vol. abs/1502.02551, pp. 1–10, Feb. 2015.
[65]C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, 2015, pp. 161–170.
[66]F. Conti and L. Benini, “A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters,” in Proc. Design, Autom. Test Eur. Conf. Exhibit., 2015, pp. 683–688.
[67]S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.-J. Yoo, “A 1.93TOPS/W scalable deep learning/inference processor with tetraparallel MIMD architecture for big-data applications,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2015, pp. 1–3.
[68]L. Cavigelli, D. Gschwend, C. Mayer, S. Willi, B. Muheim, and L. Benini, “Origami: A convolutional network accelerator,” in Proc. 25th Ed. Great Lakes Symp. VLSI, 2015, pp. 199–204.
[69]J. Sim, J.-S. Park, M. Kim, D. Bae, Y. Choi, and L.-S. Kim, “A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Jan./Feb. 2016, pp. 264–265.
[70]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[71]O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
[72]Y. Jia et al. (2014). “Caffe: Convolutional architecture for fast feature embedding.” [Online]. Available: https://arxiv.org/abs/1408.5093.
[73]Y.-H. Chen. An 1000-Class Image Classiﬁcation Task Performed by the Eyeriss-Integrated Deep Learning System, accessed on 2016. [Online]. Available: https://vimeo.com/154012013.
[74]Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May/Jun. 2010, pp. 253–256.
[75]V. Nair and G. E. Hinton, “Rectiﬁed linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010, pp. 807–814.
[76]Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efﬁcient dataﬂow for convolutional neural networks,” in Proc. 43rd Annu. Int. Symp. Comput. Archit. (ISCA), 2016, pp. 367–379.
[77]S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efﬁcient neural network,” in Proc. Adv. Neural Inf. Process. Syst., vol. 28. 2015, pp. 1135–1143.
[78]J. Howard et al., “A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Feb. 2010, pp. 108–109.
[79]B. K. Daya et al., “SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering,” in Proc. 41st Annu. Int. Symp. Comput. Archit. (ISCA), Jun. 2014, pp. 25–36.
[80]Arash Ardakani, Carlo Condo, Mehdi Ahmadi, Warren J. Gross, “An Architecture to Accelerate Convolution in Deep Neural Networks” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 65, NO. 4, APRIL 2018.
[81]Muluken Hailesellasie, Syed Rafay Hasan† mthailesel4, “A Fast FPGA-based Deep Convolutional Neural Network Using Pseudo Parallel Memories”, IEEE Xplore: 28 September 2017.
[82]S. Wang, D. Zhou, X. Han, and T. Yoshimura. (2017). “Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks.” [Online]. Available: https://arxiv.org/abs/1703.01457
[83]D. Shin, J. Lee, J. Lee, and H.-J. Yoo, “14.2 DNPU: An 8.1TOPS/W　reconfigurable CNN-RNN processor for general-purpose deep neural　networks, ” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 240-241.
[84]B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracyfrequency-scalable convolutional neural network processor in 28 nm FDSOI,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2017, pp. 246–247.

電子全文

國圖紙本論文

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供，不一定有電子全文可供下載，若連結有誤，請點選上方之〝勘誤回報〞功能，我們會盡快修正，謝謝！

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	適用於深度可分離神經網路之超大型積體電路架構設計與可規劃邏輯閘陣列實現
2.	針對深度卷積神經網絡的加速器基於NEDA與記憶體交錯
3.	邊緣人工智慧推論系統之智產元件產生器
4.	基於鑽石細化搜索和模式決定的高效視頻編碼標準快速運動估計演算法
5.	對於盛鋼桶溫降使用機器學習方法之分析與實作
6.	基於自適應增強演算法的串聯式分類器之單鏡頭前方碰撞預警系統之演算法研究及其電路架構設計與實現
7.	適用於車輛駕駛應用的智能大燈系統
8.	先進騎乘鳥瞰環景影像之研究
9.	基於脈動陣列與樹狀結構之高效稀疏卷積神經網路加速器
10.	車輛追蹤監控系統設計與尺度不變特徵轉換之特徵偵測超大型積體電路架構設計
11.	基於脈動陣列的高效卷積神經網路加速器
12.	快速多物件偵測演算法及超大型積體電路架構設計
13.	基於脈動陣列之深度殘差網路加速器
14.	基於六角收斂搜索加速HEVC移動估測演算法及VLSI架構設計
15.	3D模組標記式擴增實境之演算法及紋理硬體之電路架構研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室