(3.237.97.64) 您好!臺灣時間:2021/03/09 10:13
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:孟令昇
研究生(外文):Ling-Sheng Meng
論文名稱:適用於自主機器人導航之深度強化學習連續控制方法
論文名稱(外文):Continuous Control with Deep Reinforcement Learning for Autonomous Robot Navigation
指導教授:蔡曉萍蔡曉萍引用關係
口試委員:溫志煜黃俊龍
口試日期:2020-01-17
學位類別:碩士
校院名稱:國立中興大學
系所名稱:通訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:49
中文關鍵詞:人工智慧機器人自主導航深度強化學習社交規範Deep Deterministic Policy GradientsSoft Actor-CriticConvolutional Neural NetworkSocial Force
外文關鍵詞:Artificial IntelligenceAutonomous Robot NavigationDeep Reinforcement LearningSocial NormDeep Deterministic Policy GradientsSoft Actor-CriticConvolutional Neural NetworkSocial Force
相關次數:
  • 被引用被引用:0
  • 點閱點閱:76
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年來隨著人工智慧的發展,將人工智慧運用於移動式機器人自動導航技術的研究也越來越蓬勃發展。當自主機器人在透過深度強化學習執行導航任務時,需要學習兩個任務,分別是快速地抵達目的地並安全地避開障礙物。與傳統的導航方法相比,使用深度強化學習可以使機器人用低成本的感測器且在更複雜的環境下抵達目的地。相較於Deep Q Network (DQN)適用於離散控制的算法,透過Deep Deterministic Policy Gradients (DDPG)及Soft Actor-Critic (SAC)等學習方法可以在機器人導航上的連續動作控制會有更好的表現。
在本研究中,我們使用了SAC演算法來訓練機器人自動導航,再者我們提出了一個具有不同型態的社交區域獎懲公式,目的是讓神經網路學習依據不同型態的障礙物分別需保持的適當社交距離。為了辨別障礙物的型態以及社交區域的範圍,我們採用模擬環境中所拍攝的RGB圖像作為輸入,並使用Convolutional Neural Network (CNN)來為圖像提取特徵達到降維的目的,以利後續的訓練。之後,再將提取後的特徵作為SAC的其中一個輸入項目。
根據實驗結果顯示,我們的移動式機器人在社交環境中,除了快速地抵達目的地及安全地避開障礙物外還具有社交規範,確保機器人的行為不會影響他人的心理舒適度。
With the development of artificial intelligence in recent years, the research of applying artificial intelligence to the automatic navigation technology of mobile robots attracts more and more attention. For autonomous robot navigation, we use deep reinforcement learning (Deep-RL) to train the autonomous robot to learn two major tasks, i.e., reaching the destination quickly, meanwhile avoiding obstacle collision. Compared with traditional navigation methods, deep reinforcement learning can enable the robot to navigate in a more complex environment with low-cost sensors. Different from the Deep Q Network (DQN) algorithm for discrete control, we adopt the continuous control strategy for robot navigation to get better performance by using Deep Deterministic Policy Gradients (DDPG) and Soft Actor-Critic (SAC). In this thesis, we used the SAC algorithm to train robot for automatic navigation. Furthermore, we propose a social region reward and punishment function for different types of obstacles, the purpose of which is to let the neural network learn the proper social distance based on the types of obstacles. In order to identify the type of obstacles and the range of social regions, we take the top view RGB images in a Gazebo simulated environment as the input and use the Convolutional Neural Network (CNN) to extract features from the images to achieve dimensionality reduction for the benefit of subsequent training. After that, the extracted features are used as one of the SAC input items. According to the experimental results, in a social environment, our mobile robot has social norms in addition to reaching their destinations quickly and avoiding obstacle collision, ensuring that the robot''s behavior does not affect the psychological comfort of others.
摘要 i
Abstract ii
Content iv
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
Chapter 2 Related Work 8
2.1 Map-based Path Planning 8
2.2 Learning-based Navigation 9
Chapter 3 Preliminary 11
3.1 Reinforcement Learning (RL) 11
3.1.1 Deep Q Network (DQN) 13
3.1.2 Actor-Critic 13
3.1.3 Deep Deterministic Policy Gradient (DDPG) 14
3.1.4 Soft Actor-Critic (SAC) 14
3.2 Convolutional Neural Network (CNN) 16
3.3 Robot Operating System (ROS) 16
3.4 Gazebo 17
Chapter 4 Method 19
4.1 Purpose 19
4.2 LiDAR-based 19
4.3 Vision-based 20
4.4 Network Structure 22
4.4.1 LiDAR-based Network Structure 22
4.4.2 Vision-based Network Structure 23
4.4.3 Network Hyperparameters 24
4.5 Reward Function 24
4.5.1 LiDAR-based Reward Function 24
4.5.2 Vision-based Reward Function 26
Chapter 5 Experimental Results 28
5.1 Computer hardware and software specifications 28
5.2 Robot Model 29
5.2.1 Real Robot 29
5.2.2 Simulation Robot 32
5.3 Environments 33
5.3.1 LiDAR-based Environments 33
5.3.2 Vision-based Environments 35
5.4 Experimental Results and Discussion 36
5.4.1 Angle reward Results 36
5.4.2 LiDAR-based Results 37
5.4.3 Vision-based Results 39
Chapter 6 Conclusion 44
References 45
[1]Pakpoom Patompak, Sungmoon Jeong, Itthisek Nilkhamhang., and Nak Young Chong, “Learning social relations for culture aware interaction,” Ubiquitous Robots and Ambient Intelligence (URAI) 2017 14th International Conference on, pp. 26-31, 2017.
[2]Thibault Kruse, Amit Kumar Pandey, Rachid Alami., and Alexandra Kirsch, “Human-aware robot navigation: A survey,” in Robotics and Autonomous Systems, Volume 61, Issue 12, Pages 1726-1743, 2013
[3]Enric Galceran, Marc Carreras., “A survey on coverage path planning for robotics,” in Robotics and Autonomous Systems, Volume 61, Issue 12, Pages 1258-1276, 2013
[4]Konstantinos Charalampous, Ioannis Kostavelis, Antonios Gasteratos., “Recent trends in social aware robot navigation: A survey,” in Robotics and Autonomous Systems, Volume 93, Pages 85-104, 2017
[5]Guilherme N. Desouza, Avinash C. Kak., “Vision for mobile robot navigation: a survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 237-267, 2002
[6]Genci Capi, S. Kaneko., and Bin Hua, “Neural Network based Guide Robot Navigation: An Evolutionary Approach,” in Procedia Computer Science, Volume 76, Pages 74-79, 2015
[7]Neta Larasati, Tresna Dewi, Yurni Oktarina., “Object Following Design for a Mobile Robot using Neural Network,” Computer Engineering and Applications Journal, vol. 6, no. 1, 2017
[8]https://github.com/AbhiRP/Autonomous-Robot-Navigation-using-Deep-Learning-Vision-Landmark-Framework
[9]Erico Guizzo, “How Google''s Self-Driving Car Works.,” IEEE Spectrum, October 2011.
[10]Shweta N. Dethe, Varsha S. Shevatkar., and Prof. R. P. Bijwe, “Google driverless car,” vol. 2, issue 2, International Journal of Scientific Research in Themed Section: Engineering and Technology, 2016.
[11]Yin Zhou, Oncel Tuzel., “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” Computer Vision and Pattern Recognition, 2017.
[12]David V. Lu, Dave Hershberger., and William D. Smart, “Layered costmaps for context-sensitive navigation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Chicago, IL, pp. 709-715, 2014.
[13]Seo-Yeon Hwang, Joong-Tae Park., and Jae-Bok Song, “Autonomous navigation of a mobile robot using an upward-looking camera and sonar sensors,” in IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), Seoul, South Korea, pp. 40-45, 2010.
[14]BenQ 工業4.0 無人搬運車, https://news.cnyes.com/news/id/606008
[15]MiR100, https://www.mobile-industrial-robots.com/en/products/mir100/
[16]Felix Endres, Jürgen Hess, Nikolas Engelhard, Jürgen Sturm, Daniel Cremers., and Wolfram Burgard, “An evaluation of the RGB-D SLAM system,” in IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, pp. 1691-1696, 2012.
[17]Daniel T. Savaria, Ramprasad Balasubramanian., “V-SLAM: Vision-based simultaneous localization and map building for an autonomous mobile robot,” in IEEE Conference on Multisensor Fusion and Integration, pp. 1-6, 2010.
[18]Bin Hua, Delowar Hossain, Genci Capi, Mitsuru Jindai., and Ichiro Yoshida, “Human-like Artificial Intelligent Wheelchair Robot Navigated by Multi-Sensor Models in Indoor Environments and Error Analysis,” Procedia Computer Science, vol. 105, pp. 14-19, 2017.
[19]ROS navigation, http://wiki.ros.org/navigation/Tutorials/RobotSetup
[20]Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg., and Demis Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[21]David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra., and Martin Riedmiller, “Deterministic policy gradient algorithms,” in International Conference on Machine Learning (ICML), 2014.
[22]Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa., David Silver., and Daan Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
[23]Lei Tai, Giuseppe Paolo., and Ming Liu, “Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 31–36, 2017.
[24]Aleksandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, Anthony Francis, James Davidson., and Lydia Tapia, “PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning,” in Proc. IEEE International Conference on Robotics and Automation (ICRA), pp. 5113–5120, Brisbane, Australia, 2018.
[25]Lydia E. Kavraki, Petr Svestka, Jean-Claude Latombe., and Mark H. Overmars, “Probabilistic roadmaps for path planning in high-dimensional configuration spaces,” IEEE Trans. Robot. Automat., 12(4):566–580, August 1996.
[26]Yu Fan Chen, Michael Everett, Miao Liu., and Jonathan P. How, “Socially aware motion planning with deep reinforcement learning,” In Intelligent Robots and Systems (IROS), IEEE/RSJ International Conference on. IEEE, pp. 1343–1350, 2017.
[27]Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel., and Sergey Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning (ICML), 2018.
[28]Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel., and Sergey Levine, “Soft Actor-Critic Algorithms and Applications,” arXiv preprint arXiv: 1812.05905, 2019.
[29]Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver., and Koray Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International Conference on Machine Learning (ICML), 2016.
[30]John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford., and Oleg Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017b.
[31]John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan., and Pieter Abbeel, “Trust region policy optimization,” in International Conference on Machine Learning (ICML), pp. 1889–1897, 2015.
[32]Kunihiko Fukushima., “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics, vol. 36, issue. 4, pages 193-202, 1980.
[33]Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel., “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541-551, 1989.
[34]Dan C. Cireşan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, and Jürgen Schmidhuber., “High-Performance Neural Networks for Visual Object Classification,” arXiv preprint arXiv: 1102.0183, 2011.
[35]Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton., “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS), vol.25, no.2, 2012.
[36]Jonathan Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler., “Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation,” Proceedings of the 27th International Conference on Neural Information Processing Systems, p.1799-1807, 2014.
[37]Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.
[38]Jürgen Schmidhuber., “Deep Learning in Neural Networks: An Overview,” arXiv preprint arXiv: 1404.7828, 2014.
[39]Morgan Quigley, Brian Gerkey, Ken Conley, Josh Faust, Tully Foote, Jeremy Leibs, Eric Berger, Rob Wheeler, and Andrew Ng., “ROS: an open-source Robot Operating System,” in Proc. of the IEEE International Conference on Robotics and Automation (ICRA)Workshop on Open Source Robotics, Kobe, Japan, May 2009.
電子全文 電子全文(網際網路公開日期:20230214)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔