跳到主要內容

臺灣博碩士論文加值系統

(44.220.251.236) 您好!臺灣時間:2024/10/04 11:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳致強
研究生(外文):Chen, Chih-Chiang
論文名稱:應用於卷積類神經網路之具能源效益加速器及資料處理流程
論文名稱(外文):Energy-Efficient Accelerator and Data Processing Flow for Convolutional Neural Network
指導教授:黃威黃威引用關係
指導教授(外文):Hwang, Wei
口試委員:莊景德黃柏蒼
口試委員(外文):Chuang, Chiang-TeHuang, Po-Tsang
口試日期:2017-09-18
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電子研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:106
語文別:中文
論文頁數:75
中文關鍵詞:卷積類神經網路加速器資料處理流程
外文關鍵詞:Convolutional Neural NetworkAcceleratorData processing flow
相關次數:
  • 被引用被引用:0
  • 點閱點閱:419
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來,機器學習以及卷積類神經網路(CNN)已經成為了這世代最熱門的研究主題。以前限制於硬體技術尚未發展完全,這研究主題在上世代並不獲得太大的重視。隨著這幾年的硬體技術的蓬勃發展,一併帶起了研究熱潮。由於卷積類神經網路需要相當大的運算以及膨大的資料存取及移動量,其所消耗的能量甚至可能比本身的運算還來的大,因此如何有效的運用資料再利用以及減少資料存取量成為一大研究課題。在這篇論文當中,我們提出了一個運算單元(Processing Element)的架構可以有效的使資料再利用(data reuse)而不必再從外部記憶體來存取;同時,我們也提出了一個資料運算流程,使得資料能夠在各運算單元當中傳遞,達到資料再利用的效果。並且,我們更提出了一個有別於傳統做法的三維(3D)以及二維半(2.5D)的加速器系統架構。利用三維架構本身的矽導孔(TSV)技術來更加減少資料傳遞時所造成的能源消耗。我們也比較出傳統2D,2.5D,以及3D的能量耗損、速度等,並做出一份表格比較。在論文的後章我們也提出了一個現場可程式邏輯門陣列(FPGA)的實現流程,使得日後在這研究課題上能有更佳的幫助。整體上,我們提出一創新的可重組加速器應用於深度學習網路,優於強化計算及資料的應用。此可重組運算硬體技術可減緩能量及記憶體的隔閡,並可應用於電腦視覺、卷積類神經網路、深度學習網路等方面。
For recent years, Machine learning and Convolutional Neural Network (CNN) has become the most popular research topic in this era. Restricted to the hardware technique that has not become mature, this topic is not being fully developed before. Since CNN needs a lot of calculation and a large amount of data access and movement, the energy cost on the data access may even exceed the computation consumption. Therefore, how to manage data reuse efficiently and reduce data access has turned into a research theme. In this thesis, we propose a Processing Element (PE) that makes data reuse effectively. Meanwhile, we propose a data processing flow. With that flow, data can be propagated between each PE, making data reuse more frequently. Besides, we propose a 3D/2.5D accelerator system architecture. Transmitting data with TSV can further decrease the energy consumption. We make a comparison table of 2D, 2.5D, and 3D in speed, power, etc. Also, we propose a FPGA implementation design flow for reference and the future research. We present a new innovative reconfigurable accelerator for deep learning networks which has the advantages of both computation-intensive and data-intensive applications. This new reconfigurable computing hardware technique can mitigate the power and memory walls for both computation- and data-intensive applications such as, computer vision, computer graphics, convolution neural networks and deep learning networks.
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Contribution 2
1.2.1 Reconfigurable Processing Element 2
1.2.2 Energy-efficient Data Flow 2
1.2.3 3D Architecture Accelerator 3
1.3 Thesis Organization 3
Chapter 2 Relative Works and Previous Researches 5
2.1 Introduction 5
2.1.1 Neural Network 5
2.1.2 Convolutional Neural Network 7
2.2 Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Network 12
2.2.1 Motivation 12
2.2.2 Energy-efficient dataflow: Row Stationary 12
2.2.3 Data flow implementation and Energy Analysis 16
2.3 ShiDianNao: Shifting Vision Processing Closer to the Sensor 18
2.3.1 Overall Architecture 18
2.3.2 Neural Functional Unit and Control Unit 19
2.3.3 System Operation 22
2.4 TETRIS: Neural Network Acceleration with 3D Memory 24
2.4.1 Introduction 24
2.4.2 Architecture 25
2.4.3 Accumulation method 28
Chapter 3 Reconfigurable Processing Element Design for Convolutional Neural Network 30
3.1 Introduction 30
3.2 Reconfigurable Processing Element architecture 32
3.2.1 Processing Element Architecture 33
3.2.2 Internal storage elements 34
3.3 Simulation Result 36
Chapter 4 Energy Efficient Data Processing Flow for Convolutional Neural Network 39
4.1 Data Processing Flow with PE array 39
4.1.1 PE Array ID number 40
4.1.2 Data Processing Flow of PE Array 41
4.2 Data reuse propagation inside PE array 43
4.3 Simulation Result 47
Chapter 5 2.5D/3D Architecture Accelerator and FPGA implementation methodology 49
5.1 Introduction 49
5.1.1 1D, 2D, 3D Convolution 49
5.1.2 Conventional 2D Accelerator 51
5.1.3 Analysis of 3D and 2.5D structure 53
5.2 Proposed 3D/2.5D Accelerator 55
5.2.1 Proposed 3D Accelerator 56
5.2.2 2.5D Accelerator 58
5.2.3 Memory Hierarchy 60
5.3 CGRA implementation methodology 61
Chapter 6 Conclusion and Future Work 67
References 69
Vita 75
[1] Yann LeCun, Yoshua Bengio & Geoffrey Hinton, “Review, Deep Learning” Nature, Vol 521, 28 May 2015, pp436-444
[2] Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, Vivienne Sze ‘‘Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks’’, IEEE Journal of Solid-State Circuits, Vol. 52, No. 1, January 2017, p127-p138
[3] Yu-Hsin Chen, Joel Emer, Vivienne Sze, ‘‘Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks’’, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), p367-p379
[4] Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, Olivier Temam ‘‘ShiDianNao: Shifting Vision Processing Closer to the Sensor’’, Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on, p92-p104
[5] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An energyefficient reconfigurable accelerator for deep convolutional neural networks,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Jan./Feb. 2016, pp. 262–263
[6] Jaehyeong Sim; Jun-Seok Park; Minhye Kim; Dongmyung Bae; Yeongjae Choi; Lee-Sup Kim. “A 1.42TOPS/W Deep Convolutional Neural Network Recognition Processor for Intelligent IoE Systems”, International Solid-State Circuits Conference (ISSCC), 25 February 2016, pp264-265
[7] Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, Saibal Mukhopadhyay, ‘‘Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory’’, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), p380-p392
[8] Hardik Sharma, Jongse Park, Divya Mahajan; Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, Hadi Esmaeilzadeh, ‘‘From High-Level Deep Neural Models to FPGAs’’, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), p1-p12
[9] Mohammad Motamedi, Philipp Gysel, Venkatesh Akella, Soheil Ghiasi ‘‘Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks’’, Design Automation Conference (ASP-DAC), 2016 21st Asia and South Pacific, p575-p580
[10] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for Convolutional Neural Networks,” in IEEE ICCD, 2013
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS, 2012.
[12] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” CoRR, vol. abs/1409.1556, 2014.
[13] S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding,” in ICLR, 2016.
[14] J. J. Tithi, N. C. Crago, and J. S. Emer, “Exploiting spatial architectures for edit distance algorithms,” IEEE ISPASS, 2014.
[15] Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis, ‘‘TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory’’, ASPLOS, April 08 - 12, 2017, Xi’an, China.
[16] Hybrid Memory Cube Consortium. Hybrid Memory Cube Specification 2.1, 2014.
[17] J. Jeddeloh and B. Keeth. Hybrid Memory Cube New DRAM Architecture Increases Density and Performance. In 2012 Symposium on VLSI Technology (VLSIT), pages 87–88, 2012.
[18] JEDEC Standard. High Bandwidth Memory (HBM) DRAM. JESD235A, 2015.
[19] D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin, J. H. Cho, K. H. Kwon, M. J. Kim, J. Lee, K. W. Park, B. Chung, and S. Hong. 25.2 A 1.2V 8Gb 8-channel 128GB/s High-Bandwidth Memory (HBM) Stacked DRAM with Effective Microbump I/O Test Methods Using 29nm Process and TSV. In IEEE International Solid-State Circuits Conference (ISSCC), pages 432–433, 2014.
[20] S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. CACTI-P: Architecture-Level Modeling for SRAMbased Structures with Advanced Leakage Reduction Techniques. In 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 694–701, 2011.
[21] Remi Yu, ‘‘Foundry TSV Enablement For 2.5D/3D Chip Stacking’’, UMC Hot Chips 24, August 27, 2012
[22] K. Yoon, G. Kim, W. Lee, T. Song, J. Lee, H. Lee, K. Park, J. Kim, “Modeling and analysis of coupling between TSVs, metal, and RDL interconnects in TSV-based 3D IC with silicon interposer,” IEEE Electronics Packaging Technology Conference, pp.702-706, 2009.
[23] T. Sung, K. Chiang, D. Lee, and M. Ma, “Electrical analyses of TSVRDL- bump of interposers for high-speed 3D IC integration,” IEEE Electronic Components and Technology Conference (ECTC), pp.865-870, 2012.
[24] Po-Tsang Huang, ‘‘2.5D/3D IC Design & System Integration Lecture 6 3D-Stacked Memory Subsystem’’, National Chiao Tung Univiersity.
[25] K. N. Chen, and C. S. Tan, “Integration schemes and enabling technologies for three-dimensional integrated circuits,” IET Computers & Digital Techniques, vol. 5, no. 3, pp.160-168, May 2011.
[26] M. Dreiza, A. Yoshida, K. Ishibashi and T. Maeda, “High Density PoP (Package-on-Package) and Package Stacking Development,” IEEE Electronic Components and Technology Conference (ECTC), pp.1397- 1402, 2007.
[27] Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, Xuehai Zhou, ‘‘DLAU: A Scalable Deep Learning Accelerator Unit on FPGA’’, : IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume: 36, Issue: 3, pp. 513-517, March 2017
[28] Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li,Erjin Zhou1, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang, ‘‘Going Deeper with Embedded FPGA Platform for Convolutional Neural Network’’, FPGA’16, February 21-23, 2016, Monterey, CA, USA, pp.26-35
[29] Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, Gabriel H. Loh, ‘‘Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories ’’, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 7-11 Feb. 2015, pp126-136
[30] Dylan Stow, Itir Akgun, Russell Barnes, Peng Gu, Yuan Xie, ‘‘Cost analysis and cost-driven IP reuse methodology for SoC design based on 2.5D/3D integration’’ 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 7-10 Nov. 2016.
[31] Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, Hadi Esmaeilzadeh, ‘‘From High-Level Deep Neural Models to FPGAs’’, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp1-12.
[32] Yu-Hsuan Lin, Shih-Fan Peng, Wei Hwang, ‘‘Wide-I/O 3D-Stacked DRAM controller For Near-Data Processing System’’, 2017 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), pp1-4.
[33] Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang and William J. Dally DeePhi Tech, ‘‘ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA’’, FPGA 17, February 22 - 24, 2017, Monterey, CA, USA, pp75-84.
[34] https://www.xilinx.com/products/boards-and-kits/dk-v7-vc709-g.html#hardware
[35] R. Tessier, K. Pocek, and A. DeHon, “Reconfigurable computing architectures,” Proc. of the IEEE, vol. 103, no. 3, pp. 332–354, 2015.
[36] Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong. “Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks,” Proceedings of the 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2016), November 2016.
[37] Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J. Kurdahi, Nader Bagherzadeh and Eliseu M. C. Filho, “MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications,” Journal IEEE Transactions on Computers, Volume 49 Issue 5, Page 465-481, May 2000
[38] Dao-Ping Wang, Hon-Jarn Lin, Ching-Te Chuang, and Wei Hwang, “Low Power Multi-Port SRAM with Cross-Point Write World-Line, Shared Write Bit-Line and Shared Write Row- Access Transistors”, IEEE Transitions on Circuits and Systems II: Express Briefs, Vol. 61, No 3, pp. 182-192, March 2014.
[39] Wei Hwang, “3D SiP: Prospects and Challenges” (invited paper), 3-D Architectures for Semiconductor Integration and Packaging Conference, Burlingame, CA, USA, December 13, 2011
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊