跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/04 11:50
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:胡桂誠
研究生(外文):Guei-Cheng Hu
論文名稱:應用於邊緣裝置的機器學習系統晶片 軟硬體共同開發
論文名稱(外文):Co-Development of Software and Hardware for Machine Learning System-on-a-Chip Applied to Edge Devices
指導教授:陳慶瀚陳慶瀚引用關係
指導教授(外文):Ching-Han Chen
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系在職專班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:104
中文關鍵詞:硬體加速器系統晶片機率神經網路影像分割
外文關鍵詞:RISC-VPNNSOC
相關次數:
  • 被引用被引用:0
  • 點閱點閱:28
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究旨在開發一個結合機率神經網路(Probabilistic Neural Network, PNN)與RISC-V的機器學習系統晶片(MLSoC),以發揮硬體加速的優勢並具備微處理器的泛用性,實現高性能和高度客製化的機器學習應用。透過RISC-V自定義指令和中斷時序設計來優化軟硬體間的數據傳輸和處理流程,增進系統的整體運行效率。本研究採用MIAT系統設計方法論,實現高度的模組化設計,提高系統架構的靈活性。此外,為解決嵌入式系統中記憶體和運算資源達到最佳化設計,本研究提出一個可變精度神經網路開發框架,開發者可以依據需求調整精度。
實驗結果表明,所開發的MLSoC能夠在66毫秒內完成一張64x48大小的影像分割,每個像素的處理時間約為21微秒,消耗能量為0.00504mWh,顯示出系統在保持低功耗的同時,亦能提供高效的運算性能。此外,系統在處理不同精度設定下展現出良好的靈活性和準確性。
本研究提出了一個高效能、低功耗且易於擴展的機器學習軟硬體解決方案,MLSoC的設計在工業應用中尤其具有潛力,適合被廣泛應用於需要即時影像處理和物件識別的場景。本研究的成果也提供了一個實用的參考模型,有助於未來在FPGA上實現更多高效的機器學習解決方案,推動更廣泛的醫療和工業應用。
This study aims to develop a machine learning system-on-a-chip (MLSoC) that integrates a Probabilistic Neural Network (PNN) with RISC-V, leveraging the advantages of hardware acceleration while maintaining the versatility of a microprocessor to achieve high performance and highly customizable machine learning applications. The system optimizes data transfer and processing workflows between software and hardware through custom instructions and interrupt handling, enhancing overall system efficiency. The study employs the MIAT system design methodology to achieve a highly modular design, improving the flexibility of the system architecture. Additionally, to address the challenges of memory and computational resource limitations in embedded systems, this study proposes a variable precision neural network development framework, allowing developers to adjust precision according to their needs.
Experimental results show that the developed MLSoC can complete the segmentation of a 64x48 image in 66 milliseconds, with each pixel processed in approximately 21 microseconds, demonstrating that the system can provide efficient computational performance while maintaining low power consumption. Furthermore, the system exhibits good flexibility and accuracy under different precision settings.
This research provides an efficient, low-power, and scalable hardware solution for machine learning. The MLSoC design has significant potential in industrial applications, especially suitable for scenarios requiring real-time image processing and object recognition. The outcomes of this research also offer a practical reference model for other researchers, facilitating the development of more efficient machine learning solutions on FPGA, thereby advancing broader application development.
摘要............................................................ I
Abstract ...................................................... II
目錄 .......................................................... III
圖目錄 ........................................................ VI
表目錄 ........................................................ X
第一章、 緒論 ................................................. 1
1.1 研究背景 .................................................. 1
1.2 研究目的 .................................................. 3
1.3 論文架構 .................................................. 4
第二章、 技術回顧 ............................................. 5
2.1 RISC-V神經網路硬體加速器 .................................. 5
2.1.1 RISC-V起源與發展 ....................................... 5
2.1.2 RISC-V基本特點 ......................................... 5
2.1.3 RISC-V 指令集架構 ...................................... 6
2.1.4 基於RISC-V的神經網路硬體加速器軟硬體整合架構設計 ...........8
2.2 神經網路硬體加速 ......................................... 10
2.2.1 整數量化法 ............................................. 10
2.2.2 多精度神經網路 ......................................... 12
2.3 機率神經網路 ............................................. 13
2.3.1 機率神經網路 ........................................... 13
2.3.2 機率神經網路硬體加速器 .................................. 17
2.4 MIAT系統設計方法論 ....................................... 19
2.4.1 IDEF0階層式模組化設計 .................................. 20
2.4.2 GRAFCET離散事件建模 .................................... 22
2.4.3 硬體高階合成 ........................................... 25
第三章、 機率神經網路硬體加速器設計 ............................ 28
3.1 機率神經網路硬體設計 ...................................... 28
3.1.1 IDEF0 ................................................. 28
3.1.2 機率密度函數計算模組(A1) ................................ 29
3.1.3 決策模組(A2) ........................................... 32
3.1.4 Verilog硬體設計 ........................................ 35
3.2 定點數量化 ............................................... 36
3.3 可變精度設計 ............................................. 37
3.3.1 軟體可變精度設計 ....................................... 37
3.3.2 多精度硬體設計 ......................................... 39
3.4 管線化設計 ............................................... 39
3.4.1 管線化原理 ............................................. 39
3.4.2 PNN管線化設計 .......................................... 42
3.4.3 管線化的PNN Verilog設計 ................................ 45
第四章、 RISC-V機器學習系統晶片設計 ............................ 47
4.1 RISC-V處理器和開發平台硬體設計 ............................ 47
4.1.1 系統設計 ............................................... 47
4.1.2 系統中斷配置 ........................................... 49
4.2 RISC-V的PNN硬體加速器軟體設計 ............................. 52
4.2.1 RISC-V軟體IDEF0 ....................................... 52
4.2.2 系統狀態機(A1) ......................................... 53
4.2.3 Data Set狀態機(A11) ................................... 55
4.2.4 Test Feature狀態機(A12) ............................... 56
4.2.5 中斷狀態機(A14) ....................................... 57
4.3 RISC-V擴充指令設計 ...................................... 58
第五章、 RISC-V硬體加速器實驗 ................................ 61
5.1 實驗環境 ................................................ 61
5.1.1 實驗平台 .............................................. 61
5.1.2 測試資料集 ............................................ 64
5.1.3 訓練用特徵 ............................................ 65
5.2 機率神經網路硬體加速器實驗 ............................... 69
5.2.1 數位電路合成 .......................................... 69
5.2.2 時序驗證 ............................................. 71
5.2.3 比較不同Sigma測試結果 ................................. 71
5.2.4 比較不同位元精度測試結果 ............................... 72
5.3 RISC-V系統晶片實驗 ...................................... 73
5.3.1 系統狀態機測試 ........................................ 73
5.3.2 中斷觸發測試 .......................................... 74
5.3.3 並列式寫入測試 ........................................ 76
5.3.4 執行時間測試 .......................................... 79
5.3.5 實作結果 .............................................. 80
5.3.6 硬體加速器綜合評比 ..................................... 81
第六章、 結論與未來展望 ...................................... 85
6.1 結論 .................................................... 85
6.2 未來展望 ................................................ 86
第七章、 參考文獻 ............................................ 87
[1] K. Dang and S. Sharma, "Review and Comparison of Face Detection," 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, pp. 629-633, 2017.
[2] M. Al-Qizwini, I. Barjasteh, H. Al-Qassab and H. Radha, "Deep Learning Algorithm for Autonomous Driving using GooLeNet," IEEE Intelligent Vehicles Symposium, pp. 89-96, 2017.
[3] Z. Chen and X. Zhilu, "End-to-End Learning for Lane Keeping of Self-Driving Cars," IEEE Intelligent Vehicles Symposium, pp. 1856-1860, 2017.
[4] M. Munir, S. A. Siddiqui, A. Dengel and S. Ahmed, "DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series," IEEE Access, vol. 7, pp. 1991-2005, 2019.
[5] I. Ullah and H. Q. Mahmoud, "Design and Development of a Deep Learning-Based Model for Anomaly Detection in IoT Networks," IEEE Access, vol. 9, pp. 103906-103926, 2021.
[6] Z.-Q. Zhao, P. Zheng, S.-T. Xu and X. Wu, "Object Detection With Deep Learning: A Review," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, pp. 3212 - 3232, 2019.
[7] G. Yang, W. Feng, J. Jin, Q. Lei, X. Li, G. Gui and W. Wang, "Face Mask Recognition System with YOLOV5 Based on Image Recognition," IEEE Conference on Computer and Communications, pp. 1398-1404, 2020.
[8] W. Alexander, T. Hanazawa, G. Hinton, K. Shikano and K. J. Lang, "Phoneme Recognition Using Time-Delay Neural Networks," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, 1989.
[9] D. Strigl, K. Kofler and S. Podlipnig, "Performance and Scalability of GPU-based Convolutional Neural Networks," 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 317-324, 2010.
[10] D. Steinkraus, I. Buck and Y. P. Simard, "Using GPUs for Machine Learning Algorithms," 8th International Conference on Document Analysis and Recognition, 2005.
[11] P.-K. Tsung, S.-F. Tsai, A. Pai, S.-J. Lai and C. Lu, "High Performance Deep Neural Network on Low Cost Mobile GPU," 2016 IEEE International Conference on Consumer Electronics, pp. 69-70, 2016.
[12] H. Liu, Z. Wei, H. Zhang, B. Li and C. Zhao, "Tiny Machine Learning (Tiny-ML) for Efficient Channel," IEEE Transactions on Vehicular Technology, vol. 71, no. 6, pp. 6795-6800, 2022.
[13] M. Hussein, M. Zorkany and N. S. A. Kader, "Real Time Operating Systems for the Internet of Things," World Symposium on Computer Applications & Research, 2016.
[14] K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang and H. Yang, "Angel-Eye: A Complete Design Flow for Mapping," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 35-47, 2018.
[15] D. T. Nguyen, T. N. Nguyen, H. Kim and H.-J. Lee, "A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 8, pp. 1861-1873, 2019.
[16] M. Peemen, A. A. Setio, B. Mesman and H. Corporaal, "Memory-Centric Accelerator Design for Convolutional Neural Networks," IEEE 31st International Conference on Computer Design (ICCD), 2013.
[17] A. Waterman, Y. Lee, R. Avizienis, H. Cook, D. Patterson and K. Asanovic, "The RISC-V instruction set," IEEE Hot Chips 25 Symposium (HCS), 2013.
[18] L. Zhang, X. Zhou and C. Guo, "A CNN Accelerator with Embedded RISC-V Controllers," China Semiconductor Technology International Conference (CSTIC), 2021.
[19] L. Ren, M. Yang, Y. Liu and J. Han, "Quality Defect Recognition Method Based on Variable Precision Rough Set and Deep Belief Network," 2022 4th International Conference on Artificial Intelligence and Advanced Manufacturing, 2022.
[20] Y. He, Y. Wang, Y. Wang, H. Li and X. Li, "An Agile Precision-Tunable CNN Accelerator based on ReRAM," IEEE/ACM International Conference on Computer-Aided Design, pp. 1-7, 2019.
[21] C.-H. Chen, M.-Y. Lin and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[22] SiFive, "The RISC-V Instruction Set Manual," 7 May 2017. [Online]. Available: https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf. [Accessed 27 Feb 2024].
[23] S.-Y. Lee, Y.-W. Hung, Y.-T. Chang, C.-C. Lin and G.-S. Shieh, "RISC-V CNN Coprocessor for Real-Time Epilepsy Detection in Wearable Application," IEEE Transactions on Biomedical Circuits and Systems, vol. 15, no. 4, pp. 679-691, 2021.
[24] P. Luszczek, J. Kurzak, I. Yamazaki and J. Dongarra, "Towards numerical benchmark for half-precision floating point arithmetic," in 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2017.
[25] X.-T. Tran, D.-A. Nguyen, D.-H. Bui and X.-T. Tran, "A Variable Precision Approach for Deep Neural Networks," in 2019 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 2019.
[26] R. Singh, T. Conroy and P. Schaumont, "Variable Precision Multiplication for Software-Based Neural Networks," in 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2020.
[27] D. F. Specht, "Probabilistic Neural Networks," Neural networks, vol. 3, no. 1, pp. 109-118, 1990, vol. 3, no. 1, pp. 109-118, 1990.
[28] K. Vipin, Y. Akhmetov, S. Myrzakhme and A. P. James, "FAPNN : An FPGA based Approximate Probabilistic Neural Network Library," 2018 International Conference on Computing and Network Communications (CoCoNet), pp. 64-68, 30 Sep 2018.
[29] N. Aibe, M. Yasunaga, I. Yoshihara and J. H. Kim, "A Probabilistic Neural Network Hardware System Using a Learning-Parameter Parallel Architecture," Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290), pp. 2270-2275, 2002.
[30] G. Minchin and A. Zaknich, "A Design for FPGA Implementation of the Probabilistic Neural Network," ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings, pp. 556-559, 1999.
[31] N. Bu, T. Hamamoto, T. Tsuji and O. Fukuda, "FPGA Implementation of a Probabilistic Neural Network for a Bioelectric Human Interface," The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS '04, pp. iii-29-32, 2004.
[32] R. J. Mayer, "IDEF0 Function Modeling," A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report, AFWAL-TR-81-4023 (The IDEF0 Yellow Book), 1992.
[33] R. David, "Grafcet: A Powerful Tool for Specification of Logic Controllers," IEEE Transactions on Control Systems Technology, vol. 3, no. 3, pp. 253-265, 1995.
電子全文 電子全文(網際網路公開日期:20290605)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top