跳到主要內容

臺灣博碩士論文加值系統

(44.200.101.84) 您好!臺灣時間:2023/10/03 08:29
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡孟芸
研究生(外文):Tsai, Meng-Yun
論文名稱:以Glow編譯神經網路模型至NVDLA
論文名稱(外文):Compile Deep Neural Networks to NVDLA Using Glow
指導教授:陳添福陳添福引用關係
指導教授(外文):Chen, Tien-Fu
口試委員:陳添福吳凱強張貴忠陳柏諭
口試委員(外文):Chen, Tien-FuWu, Kai-ChiangChang, Kuei-ChungChen, Po-Yu
口試日期:2019-08-19
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:英文
論文頁數:49
中文關鍵詞:深度學習加速器編譯器
外文關鍵詞:Deep learningacceleratorcompiler
相關次數:
  • 被引用被引用:0
  • 點閱點閱:534
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著人工智能的發展,我們在不同的深度學習框架上訓練模型,並部署預先訓練的模型在CPU或GPU上運行。由於物聯網設備計算和功率的能力受到限制,因此有一些深度學習加速器的設計來加速深度學習運算。但是,沒有完整的流程讓預訓練模型從框架部署至深度學習加速器。我們選擇NVIDIA深度學習加速器作為目標。另一方面,由於NVDC編譯器不是完全開源的,並只支持Caffe模型,我們通過NVDC編譯器生成的執行代碼來研究NVDLA的設計特性,並在此篇論文中,我們會詳細介紹NVDLA硬件規範。另外,我們選擇Glow,一種開源神經網絡編譯器,旨在將ONNX格式化模型編譯成為CPU可執行文件。本文旨在通過實現將ONNX操作映射到NVDLA硬件操作員的C運行時,將深度學習加速器添加到Glow的後端。
With the development of Artificial Intelligence, we train models on different frameworks and deploy pre-trained model for running on CPU or GPU. Because the ability of IoT devices computing and power are restricted, there are some designs of deep learning accelerator to accelerate DL computation. However, there is no entirely workflow to make pre-trained model from framework to DLA. We choose NVIDIA Deep Learning Accelerator (NVDLA) as target. On the other hand, since NVDC compiler is not totally open-source and it only supports Caffe model, we investigate design features of NVDLA by carefully inspecting the execution code generated from NVDC compiler. Also, we go through the detailed implementation by studying NVDLA hardware specification. In addition, we choose Glow, an open-source Neural Network compiler that aims to compile ONNX formatted model to CPU executables. This thesis is targeted to add Deep Learning Accelerator to backend of Glow by implementing a C runtime which maps ONNX-operations to NVDLA hardware operators.
摘要       i
ABSTRACT ii
誌謝       iii
Table of Contents iv
List of Tables vi
List of Figures vii
I. Introduction 1
II. Backgrounds and Related Works 2
2.1 NVIDIA Deep Learning Accelerator (NVDLA) 2
2.1.1 Hardware architecture overview 2
2.1.2 Software stack overview 6
2.2 Graph Lowering (Glow) 7
2.2.1 High-level IR 8
2.2.2 Low-level IR 9
2.3 TVM 11
2.3.1 Optimize Computational Graphs 11
2.3.2 Operator-level Optimization 13
2.4 ONNC 15
2.4.1 ONNC IR 15
2.4.2 Pass Manager for Different Targets 16
III. Methodology 17
3.1 System overview 18
3.2 Explore Features from loadable file 19
3.2.1 Build Debugging Tool on Simulator 19
3.2.2 Content of Loadable File 20
3.3 Implement C Runtime on NVDLA 21
3.3.1 In-memory Data Formats of Activations and Weights 22
3.3.2 Data partition on Convolution 24
3.3.3 Ping-pong Register Design 26
3.3.4 Subunit of NVDLA 28
3.3.5 Execution flow of C Runtime 28
IV. Implementations on Glow 30
4.1 Graph optimization 31
4.1.1 New Backend-Specific node and instruction 31
4.1.2 Operator fusion 32
4.1.3 Data layout transformation 34
4.1.4 Data partition for Convolution on HIR 35
4.2 Implementations for Code generation 37
4.2.1 Data partition and align address for data 38
4.2.2 Data transformation with GEMM 40
V. Evaluation 41
5.1 Environment setup 41
5.1.1 NVDLA virtual platform 41
5.1.2 NVDLA address space 42
5.2 Unit tests 43
5.2.1 Fully connected layer 43
5.3 Validation 45
VI. Conclusion and Future Work 47
VII. References 48
[1] Adam Paszke, et al. "PyTorch." (2017). [Online] Available: https://pytorch.org/
[2] Martin Abadi, et al. "TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system for large-scale machine learning." In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016.
[3] NVIDIA, NVIDIA Deep Learning Accelerator (2017). Available: http://nvdla.org
[4] Y. Jia, E. Shelhamer, and J. Donahue, et al. "Caffe: Convolutional Architecture for Fast Feature Embedding." arXiv preprint arXiv:1408.5093, 2014.
[5] Nadav Rotem, et al. "Glow: Graph lowering compiler techniques for neural networks." arXiv preprint arXiv:1805.00907, 2018.
[6] Yan LeCun, et al. "Gradient-based learning applied to document recognition." In Proceedings of the IEEE 86.11 (1998): 2278-2324., 1998.
[7] Alex Krizhevsky, et al. "Imagenet classification with deep convolutional neural networks." In Advances in Neural Information Processing Systems (NIPS), 2012.
[8] Kaiming He, et al. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016.
[9] Tianqi Chen, et al. "{TVM}: An automated end-to-end optimizing compiler for deep learning." In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 2018.
[10] Wei-Fen Lin, et al. "ONNC: A compilation framework connecting ONNX to proprietary deep learning accelerators." In 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2019.
[11] ONNX, Open Neural Network Exchange (2017). [Online] Available: https://github.com/onnx/onnx
[12] Sébastien Marcel, and Yann Rodriguez. "Torchvision the machine-vision package of torch." In Proceedings of the 18th ACM international conference on Multimedia, pp. 1485-1488. ACM, 2010.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊