跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.90) 您好!臺灣時間:2025/01/22 13:56
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:湯智惟
研究生(外文):Tang, Chih-Wei
論文名稱:透過資料重用和流量負載之 AI 加速器的高效工作負載分析與流量優化
論文名稱(外文):Efficient Workload Analysis and Traffic Optimization for AI Accelerators by Data Reuse and Traffic Pattern Techniques
指導教授:陳添福陳添福引用關係
指導教授(外文):Chen, Tien-Fu
口試委員:陳中和張貴忠林泰吉陳添福
口試委員(外文):Chen, Chung-HoChang, Kuei-ChungLin, Tay-JyiChen, Tien-Fu
口試日期:2023-08-10
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:網路工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:112
語文別:英文
論文頁數:42
中文關鍵詞:AI 加速器分析模型深度神經網路工作負載分析流量優化
外文關鍵詞:AI AcceleratorAnalytical ModelDeep Neural NetworkWorkload AnalysisTraffic Optimization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:14
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著深度學習模型變得越來越複雜,它們的計算和記憶需求也相應擴大。人工智
能加速器為應對這些挑戰提供了解決方案。然而,實施人工智能加速器存在著相當大
的成本和挑戰。因此,在加速器上評估深度學習模型的性能至關重要。人工智能分析
模型通常用於此目的。然而,過去的研究常常僅依賴數學估計,並忽略了記憶體管
理,導致結果的不準確性。
我們提出了一種高層次的人工智能加速器分析器,它充分利用了分析模型的速度
和模擬器的準確性。我們的方法包括進行全面的工作負載分析以生成流量模式,使我
們能夠深入了解數據移動情況。我們還融入了記憶管理,以模擬記憶讀寫操作。此
外,我們提出了算子間的拆分與分析和流量優化技術,以減少總體流量。我們提出的
方法顯著提高了性能評估的準確性,增強了人工智能加速器框架的效益。
As deep learning models grow in complexity, their computation and memory needs also
expand. AI accelerators offer a solution to address these challenges. However, implementing
AI accelerators comes with significant costs and challenges. Thus, evaluating the performance
of deep learning models on accelerators is crucial. AI analytical models are frequently
employed for this purpose. However, previous works rely on mathematical estimation and
often overlook memory management, leading to inaccurate results.
We present a high-level AI accelerator analyzer that leverages the speed of analytical
models and the accuracy of simulators. Our approach involves conducting comprehensive
workload analysis to generate traffic patterns, enabling us to gain insights into data
movement. We also incorporate memory management to simulate memory read and write
operations. Additionally, our work supports fused operators splitting and analysis, and traffic
optimization techniques to reduce overall traffic. Our proposed method significantly improves
performance evaluation precision, enhancing the effectiveness of AI accelerator frameworks.
摘 要i
ABSTRACTii
Table of Contentsiii
List of Figuresiv
List of Tablesv
I.Introduction1
1.1Motivation1
1.2Problem Definition2
1.3Contributions2
II.Background and Related Work4
2.1Deep Learning Accelerator4
2.1.1Dataflow6
2.2Data Reuse8
2.3Analytical Model9
2.3.1Common Analytical Model10
2.3.2Limitations11
2.4Model Splitting & Model Mapping12
2.5Summary13
III.Traffic Generator14
3.1System Design14
3.1.1System Architecture14
3.1.2Hardware Architecture16
3.1.3Design Space Exploration (DSE)17
3.1.4Hardware Simulator17
3.2Model Splitting18
3.3Model Mapping20
3.4Memory Management & Workload Analysis22
3.5Traffic Generator24
3.5.1Traffic Pattern25
3.5.2Traffic Pattern Example26
IV.Traffic Optimization28
4.1Fused Operators Support28
4.2Weight Aggregation30
V.Evaluation33
5.1Environment33
5.2Evaluation on Fused Operators33
5.3Evaluation on Weight Aggregation37
VI.Conclusion and Future Work40
VII.References41
[1]KwonHyoukjun, SamajdarAnanda, and KrishnaTushar, “MAERI,” ACM SIGPLAN Notices, vol. 53, no. 2, pp. 461–475, Mar. 2018, doi: 10.1145/3296957.3173176.
[2]ChenTianshi et al., “DianNao,” ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, Feb. 2014, doi: 10.1145/2654822.2541967.
[3]ChenYu-Hsin, EmerJoel, and SzeVivienne, “Eyeriss,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367–379, Jun. 2016, doi: 10.1145/3007787.3001177.
[4]Y. H. Chen, T. J. Yang, J. S. Emer, and V. Sze, “Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices,” IEEE J Emerg Sel Top Circuits Syst, vol. 9, no. 2, pp. 292–308, Jun. 2019, doi: 10.1109/JETCAS.2019.2910232.
[5]Z. Du et al., “ShiDianNao: Shifting vision processing closer to the sensor,” Proc Int Symp Comput Archit, vol. 13-17-June-2015, pp. 92–104, Jun. 2015, doi: 10.1145/2749469.2750389.
[6]N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” Proc Int Symp Comput Archit, vol. Part F128643, pp. 1–12, Jun. 2017, doi: 10.1145/3079856.3080246.
[7]H. Kwon, P. Chatarasi, V. Sarkar, T. Krishna, M. Pellauer, and A. Parashar, “MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings,” IEEE Micro, vol. 40, no. 3, pp. 20–29, May 2020, doi: 10.1109/MM.2020.2985963.
[8]H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, “Understanding reuse, performance, and hardware cost of DNN dataflows: A data-centric approach,” Proceedings of the Annual International Symposium on Microarchitecture, MICRO, pp. 754–768, Oct. 2019, doi: 10.1145/3352460.3358252.
[9]A. Parashar et al., “Timeloop: A Systematic Approach to DNN Accelerator Evaluation,” Proceedings - 2019 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2019, pp. 304–315, Apr. 2019, doi: 10.1109/ISPASS.2019.00042.
[10]X. Yang et al., “Interstellar: Using halide’s scheduling language to analyze DNN accelerators,” International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, pp. 369–383, Mar. 2020, doi: 10.1145/3373376.3378514.
[11]S. Dave, Y. Kim, S. Avancha, K. Lee, and A. Shrivastava, “DMazerunner: Executing perfectly nested loops on dataflow accelerators,” ACM Transactions on Embedded Computing Systems, vol. 18, no. 5s, Oct. 2019, doi: 10.1145/3358198.
[12]L. Lu et al., “TENET: A framework for modeling tensor dataflow based on relation-centric notation,” Proc Int Symp Comput Archit, vol. 2021-June, pp. 720–733, Jun. 2021, doi: 10.1109/ISCA52012.2021.00062.
[13]T. Jin and S. Hong, “Split-CNN: Splitting Window-based Operations in Convolutional Neural Networks for Memory System Optimization,” International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, pp. 835–847, Apr. 2019, doi: 10.1145/3297858.3304038.
[14]S. C. Kao, G. Jeong, and T. Krishna, “Confuciux: Autonomous hardware resource assignment for DNN accelerators using reinforcement learning,” Proceedings of the Annual International Symposium on Microarchitecture, MICRO, vol. 2020-October, pp. 622–636, Oct. 2020, doi: 10.1109/MICRO50266.2020.00058.
[15]P. Chatarasi, H. Kwon, A. Parashar, M. Pellauer, T. Krishna, and V. Sarkar, “Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators,” ACM Transactions on Architecture and Code Optimization (TACO), vol. 19, no. 1, Dec. 2021, doi: 10.1145/3485137.
[16]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” pp. 770–778, 2016. Accessed: Aug. 04, 2023. [Online]. Available: http://image-net.org/challenges/LSVRC/2015/
[17]K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2014, Accessed: Aug. 04, 2023. [Online]. Available: https://arxiv.org/abs/1409.1556v6
[18]Gem5 [Online]. Available https://www.gem5.org/
[19]“Nvidia deep learning accelerator (nvdla),” 2018, http://nvdla.org
[20]Onnx/onnx: Open standard for machine learning interoperability, GitHub. Available at: https://github.com/onnx/onnx (Accessed: 04 August 2023)
[21]Wei-Chun Huang, “Design Space Exploration for Scalable DNN Accelerators Using a Memory-Centric Analytical Model for HW/SW Co-Design”, Aug. 2023
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top