(3.92.96.236) 您好!臺灣時間:2021/05/09 00:53
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:帕默巴
研究生(外文):Mustafa BaranPEKER
論文名稱:Learning Pursuer and Evasion StrategyUsing the Q Network
論文名稱(外文):Learning Pursuer and Evasion StrategyUsing the Q Network
指導教授:陳介力陳介力引用關係
指導教授(外文):Chieh-Li Chen
學位類別:碩士
校院名稱:國立成功大學
系所名稱:航空太空工程學系
學門:工程學門
學類:機械工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:63
外文關鍵詞:Pursuit-EvasionDeep-Q LearningArtificial Potential FieldMulti-Agent Systems
相關次數:
  • 被引用被引用:0
  • 點閱點閱:18
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
The game of pursuit-evasion has always been a popular research topic in the field of robotics. Especially in the last decades, when the agents turned into intelligent agents and they started to use the information about their environment for their own purposes,without any initial information about the environment. This tendency attracts remarkable amount of attention and opened the new area to newcomers from several different disciplines.
Reinforcement learning is a widely used method in pursuit-evasion domain. when agents interact with the environment, agents use the feedbacks (rewards and punishments) taken from the environment to learn and optimize their action. With the help of reinforcement learning.
In this master’s thesis, a research has done on multi-agent pursuit-evasion game problem in an environment with an obstacle by using reinforcement learning and the experimental results are submitted. The intelligent agents use Deep Q-Learning algorithm and artificial potential field for the solution of the problem.
Two different approaches are accepted at the level of multi-robot cooperative systems to manage the interaction between agents These are team learning and concurrent learning. In team learning, agents are managed from a single center as a team and the mind used in learning is located in this center. According to this idea, although the agents are easy to manage, the agents in the team do not have a say on their own so that in learning phase each pursuer will have knowledge of the other pursuers. Their actions depend on the locations of the all pursuers, evader and obstacles. In concurrent learning,each agent is an individual and is responsible for his own moves. Each agent uses its own mind, isolated from others, to perform learning. In concurrent learning pursuers action depends on their own location, evader and obstacles. In our work, team learning and concurrent learning approaches are adapted for the learning of the pursuit team.
TABLE OF CONTENTS

ABSTRACT I
TABLE OF CONTENTS III
ABBREVIATIONS VI
SYMBOLS VII
LIST of FIGURES VIII
INTRODUCTION 1
LITERATURE REVIEW 5
2.1 Pursuit-Evasion Game Problem 5
2.2 Multi-Agent Systems 8
2.2.1 Team Learning 10
2.2.2 Concurrent Learning 10
2.3 Reinforcement Learning 11
2.3.1 Epsilon Greedy Action Selection 15
2.3.2 Q-Learning 16
2.3.3 Deep Q Learning 17
2.4 Artificial Potential Field 21
METHOD 23
3.1 Environment 25
3.2 Eye-Sight 27
3.3 Evader Training 28
3.3.1 Repulsive Forces Created by Pursuers; 29
3.3.2 Repulsive Forces Created by Boundaries and Obstacles; 31
3.3.3 Reward Function 31
3.4 Pursuer Training 32
3.4.1 Repulsive Force Created by Evader; 33
3.4.2 Reward Function 34
EXPERIMENT 35
4.1 Value Function Analysis 36
4.1.1 Value Function Analysis for Evader 37
4.1.2 Value Function Analysis for Pursuer Trained with Team Learning 40
4.1.3 Value Function Analysis for Pursuer Trained with Concurrent Learning 43
4.2 Comparison Among Agents Step Base 45
4.3 Comparison Among Agents Episode Base 49
4.4 Optimal Step Size in an Episode 53
CONCLUSION 59
REFERENCE 61
REFERENCE
[1] R. Vidal, O. Shakernia, H.J. Kim, D.H. Shim, S. Sastry, “Probabilistic pursuit evasion games: theory, implementation and experimental evaluation, IEEE
Transactions on Robotics and Automation, 18-5, page 662-669, 2002.
[2] R.S. Sutton, A.G. Barto, “Reinforcement Learning: An Introduction, The MIT Press: Cambridge, Massachusetts, 1998.
[3] P. Stone, M. Veloso, “Multiagent systems: a survey from a machine learning perspective, Autonomous Robots, 345-383, page 8, 2000.
[4] Zheng Zhu. “Learning Evasion Strategy in Pursuit-Evasion by Deep Q-network, 24th International Conference on Pattern Recognition (ICPR). Beijing, China. 2018
[5] R. Isaacs, “Differential Games: A Theory with Applications to Warfare and Pursuit, Control and Optimization, New York: John Wiley & Sons, 1965.
[6] M.M. Flood, “The hide and seek game of Von Neumann, Management Science, 18-5, January 1972.
[7] Ying-Chun Chen. “Mas-Based Pursuit-Evasion Algorithm Under Unknown Environment, 2005 International Conference on Machine Learning and Cybernetics.2005
[8] J.W. Durham, A. Franchi, F. Bullo, “Distributed pursuit-evasion without mapping or global localization via local frontiers, Auton Robot, 32, Page 81-95, 2012
[9] L.E. Parker, “Distributed algorithms for multi-robot observation of multiple moving targets, Autonomous Robots, 12, page 231-255, 2002
[10] J. Liu, S. Liu, H. Wu, Y. Zhang, “A pursuit-evasion algorithm based on hierarchical reinforcement learning, International Conference on Measuring Technology and
Mechatronics Automation, Hunan, China, 11-12 April 2009
[11] T. Haynes, S. Sen, “Evolving behavioral strategies in predators and prey, Adaptation and Learning in Multiagent Systems, Springer Verlag: Berlin, page 113-126, 1996
[12] Xinyi Fan. “A novel pursuit strategy for fast evader in indoor pursuit-evasion games, Proceedings of the 10th World Congress on Intelligent Control and Automation. 2012
[13] L. Panait, S. Luke, “Cooperative multi-agent learning: The state of the art, Autonomous Agents and Multi-Agent Systems, 11, page 387-434, 2005.
[14] E.U. Acar, H.Choset, Y. Zhang, M. Schervish, “Path planning for robotic demining: robust sensor-based coverage of unstructured environments and probabilistic methods,
The International Journal of Robotics Research, 22, page 441-466, 2003.
[15] https://www.groundai.com/project/solving-the scalarization-issues-of-advantagebased-
reinforcement-learning-algorithms/1
[16] N. Ono, K. Fukumoto, “Multi-agent reinforcement learning: a modular approach, 2nd International Conference on Multiagent Systems, Kyoto, Japan, 9-13 December 1996.
[17] S. Russell, P. Norvig, “Artificial Intelligence: A Modern Approach, 1995.
[18] B. Bouzy ve M. Metivier, “Multi-agent model-based reinforcement learning experiments in the pursuit evasion game, 2007
[19] Kai Arulkumaran ; Marc Peter Deisenroth “Deep Reinforcement Learning: A Brief Survey, IEEE Signal Processing Magazine page 26-38. November 2017
[20] Ethan Holly, Shixiang Gu “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates, 2017 IEEE International Conference on Robotics and Automation (ICRA), July 2017.
[21] https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html
[22]https://towardsdatascience.com/self-learning-ai-agents-part-ii-deep-q-learningb5ac60c3f47
[23]https://medium.com/@rymshasiddiqui/path-planning-using-potential-fieldalgorithm-a30ad12bdb08
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
無相關期刊
 
無相關點閱論文
 
系統版面圖檔 系統版面圖檔