(3.235.108.188) 您好!臺灣時間:2021/02/28 00:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:鄭沐軒
研究生(外文):Mu-HsuanCheng
論文名稱:基於分散式深度學習模型之可調式計算軟體框架的設計
論文名稱(外文):On Designing the Adaptive Computation Framework for Distributed Deep Learning Models
指導教授:涂嘉恒
指導教授(外文):Chia-Heng Tu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:42
中文關鍵詞:分佈式系統設計分佈式深度學習模型設計平行運算物聯網應用嵌入式設備OpenCL 加速MQTT
外文關鍵詞:Distributed systems designdistributed deep learning models designparallel computingInternet-of-Things applicationsembedded devicesOpenCL accelerationsMQTT
相關次數:
  • 被引用被引用:0
  • 點閱點閱:94
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
我們提出了一個助於分佈式深度學習模型的推理框架,它由分佈式計算層次中的設備協同執行。例如,在物聯網(IoT)應用中,三層計算層次由終端設備,網關和服務器組成,並且模型推理可以由底部的一個或多個計算層自適應地完成到層次結構的頂部。再先前未完成的工作中允許訓練好的模型運行在實際分佈式系統上,所提出的框架可以實現分佈式深度學習模型和系統的協同設計。特別是,除了模型精度(這是模型設計人員主要關心的問題)之外,我們還發現,由於物聯網應用領域中存在各種類型的計算平台,所以在實際系統上測量開發模型的交付性能也是至關重要,以確保模型推理不會在終端設備上花費太多時間。此外,模型(和系統)的測量性能對於下一個設計週期中的模型/系統設計是一個很好的參考,例如,以確定神經網絡層到分層結構的各層上有更好的映射。在框架之上,我們構建了用於檢測對象的監視系統作為案例研究。在我們的實驗中,我們評估了兩層計算層次結構中模型設計的交付性能,展示了自適應推理計算的優點,分析了給定工作負載下的系統容量,並討論了模型參數設置對系統的影響容量。我們相信性能評估的實現加速了分佈式深度學習模型/系統的設計過程。
We propose the computation framework that facilitates the inference of the distributed deep learning model to be performed collaboratively by the devices in a distributed computing hierarchy. For example, in Internet-of-Things (IoT) applications, the three-tier computing hierarchy consists of end devices, gateways, and server(s), and the model inference could be done adaptively by one or more computing tiers from the bottom to the top of the hierarchy. By allowing the trained models to run on the actually distributed systems, which has not done by the previous work, the proposed framework enables the co-design of the distributed deep learning models and systems. In particular, in addition to the model accuracy, which is the major concern for the model designers, we found that as various types of computing platforms are present in IoT applications fields, measuring the delivered performance of the developed models on the actual systems is also critical to making sure that the model inference does not cost too much time on the end devices. Furthermore, the measured performance of the model (and the system) would be a good input to the model/system design in the next design cycle, e.g., to determine a better mapping of the network layers onto the hierarchy tiers. On top of the framework, we have built the surveillance system for detecting objects as a case study. In our experiments, we evaluate the delivered performance of model designs on the two-tier computing hierarchy, show the advantages of the adaptive inference computation, analyze the system capacity under the given workloads, and discuss the impact of the model parameter setting on the system capacity. We believe that the enablement of the performance evaluation expedites the design process of the distributed deep learning models/systems.
摘要---i
Abstract---ii
誌謝---iv
Table of Contents---v
List of Tables---vii
List of Figures---viii
Chapter 1. Introduction 1
1.1. Deep Learning and IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Organization of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2. BACKGROUND AND MOTIVATION 4
2.1. DDNN Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2. DDNN Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3. Motivation and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3. Adaptive Computation Framework 9
3.1. Server Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2. Device Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3. Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4. Iterative Evaluation Method for Exploring DDNN System Designs . . . . . 16
Chapter 4. System Design Remarks 17
4.1. Characteristics of the Target Systems . . . . . . . . . . . . . . . . . . . . . 17
4.2. Code Generation for End Devices . . . . . . . . . . . . . . . . . . . . . . 18
4.3. Thread Handling Models in Model Inference Acceleration . . . . . . . . . 18
4.4. Communication Cost of the Hierarchical Inference . . . . . . . . . . . . . 19
4.5. Miscellaneous Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 5. EXPERIMENTAL RESULTS 22
5.1. Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2. Distributed Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2.1. Impact of Entropy Threshold . . . . . . . . . . . . . . . . . . . . . 30
5.2.2. Accelerating Device Inference with OpenCL . . . . . . . . . . . . . 34
Chapter 6. Related Work 36
6.1. High resource consumption for IoT . . . . . . . . . . . . . . . . . . . . . . 36
6.2. The hybrid approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3. NN models optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 7. Conclusion 39
References 41
[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pages 265–283, 2016.
[2] Takuya Akiba, Keisuke Fukuda, and Shuji Suzuki. Chainermn: Scalable distributed deep learning framework. arXiv preprint arXiv:1710.11351, 2017.
[3] Benjamin Aziz. A formal model and analysis of the MQ Telemetry Transport protocol. In Proceedings of the 9th International Conference on Availability, Reliability and Security, pages 59–68, 2014.
[4] Flavio Bonomi, Rodolfo Milito, Preethi Natarajan, and Jiang Zhu. Fog computing: A platform for internet of things and analytics. In Big data and internet of things: A roadmap for smart environments, pages 169–186. 2014.
[5] Dan Ciregan, Ueli Meier, and Jürgen Schmidhuber. Multi-column deep neural networks for image classification. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 3642–3649, 2012.
[6] Jeff Dean. Large scale deep learning. In Proceedings of the Keynote GPU Technical Conference, 2015.
[7] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In Proceedings of the Advances in Neural Information Processing systems, pages 1223–1231, 2012.
[8] Silla Federico. rCUDA: Virtualizing GPUs to reduce cost and improve performance. In Proceedings of the STAC Summit, pages 1–43, 2014.
[9] Niroshinie Fernando, Seng W Loke, and Wenny Rahayu. Mobile cloud computing: A survey. Future Generation Computer Systems, Elsevier, 29(1):84–106, 2013.
[10] Shih-Hao Hung, Tien-Tzong Tzeng, Gyun-De Wu, and Jeng-Peng Shieh. A code offloading scheme for big-data processing in Android applications. Software: Practice and Experience, Wiley Online Library, 45(8):1087–1101, 2015.
[11] Shih-Hao Hung, Tien-Tzong Tzeng, Jyun-De Wu, Min-Yu Tsai, Yi-Chih Lu, Jeng-Peng Shieh, Chia-Heng Tu, and Wen Jen Ho. MobileFBP: Designing portable reconfigurable applications for heterogeneous systems. Journal of Systems Architecture - Embedded Systems Design, Elsevier North-Holland, 60(1):40–51, 2014.
[12] Forrest N Iandola, Matthew W Moskewicz, Khalid Ashraf, and Kurt Keutzer. Fire-Caffe: Near-linear acceleration of deep neural network training on compute clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2592–2600, 2016.
[13] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The CIFAR-10 dataset, 2014.
[14] Aaron Markham and Yangqing Jia. Caffe2: Portable high-performance deep learning framework from Facebook, 2017.
[15] Bradley McDanel, Surat Teerapittayanon, and Hsiang-Tsung Kung. Embedded binarized neural networks. In Proceedings of the International Conference on Embedded Wireless Systems and Networks, pages 168–173, 2017.
[16] Taiga Nomi. Tiny-dnn documentation, 2018.
[17] Scalagent Distributed Technologies. Benchmark of MQTT servers, 2015.
[18] Surat Teerapittayanon, Bradley McDanel, and Hsiang Tsung Kung. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, pages 328–339, 2017.
[19] Seiya Tokui. Introduction to Chainer: A flexible framework for deep learning, 2017.
[20] Tai-Lun Tseng, Shih-Hao Hung, and Chia-Heng Tu. Migratom.js: A Javascript migration framework for distributed web computing and mobile devices. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, pages 798–801, 2015.
[21] Chia-Heng Tu, Hui-Hsin Hsu, Jen-Hao Chen, Chun-Han Chen, and Shih-Hao Hung. Performance and power profiling for emulated Android Systems. ACM Transactions on Design Automation of Electronic Systems, ACM, 19(2):10, 2014.
[22] Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu chun Feng. VOCL: An optimized environment for transparent virtualization of Graphics Processing Units. In Proceedings of the IEEE Conference on Innovative Parallel Computing, pages 1–12, 2012.
[23] Shanhe Yi, Zijiang Hao, Zhengrui Qin, and Qun Li. Fog computing: Platform and applications. In Proceedings of the IEEE Workshop on Hot Topics in Web Systems and Technologies, pages 73–78, 2015.
[24] Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Di He, Wei Zuo, Zuofu Cheng, Kyle Rupnow, and Deming Chen. Machine learning on fpgas to face the iot revolution. In Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, pages 819–826, Nov 2017.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文
 
系統版面圖檔 系統版面圖檔