跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.107) 您好!臺灣時間:2025/12/18 06:44
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳建成
研究生(外文):Chien-Cheng Chen
論文名稱:以適應性Q-Learning為基礎發展足球機器人合作策略
論文名稱(外文):Cooperative Strategy Based on Adaptive Q-Learning for Robot Soccer Systems
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Kao-Shing Hwang
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2003
畢業學年度:91
語文別:中文
論文頁數:60
中文關鍵詞:足球機器人合作策略加強式學習適應性 Q-Learning合作協調
外文關鍵詞:Robot Soccer GameCooperative StrategyReinforcement LearningAdaptive Q-LearningCooperation and Coordination
相關次數:
  • 被引用被引用:6
  • 點閱點閱:438
  • 評分評分:
  • 下載下載:88
  • 收藏至我的研究室書目清單書目收藏:3
本論文主要目的,是要發展一個具有學習能力的足球機器人合作策略,此策略將可讓足球機器人在比賽中互助合作、協調,以達到合作進攻與防守的目的。並經由學習的機制,使足球機器人可以記取比賽中各種成功和失敗的經驗,機器人可利用這些經驗讓整體表現變得越來越好。此合作策略採階層式架構,最上層負責學習角色個數的分配,也就是在什麼情況下,防守要幾人,助攻要幾人。接下來第二層必須根據上層的結果來做角色分配,我們發展兩種演算法來選出主攻者、助攻者及防守者。最後一層則去執行各種角色必須做的行為及任務,主攻者負責追球、攻擊,助攻者負責學習跑位,而防守者負責防守球門。選出來的角色並不是永久固定,機器人可以視情況動態地和對方互換角色,機器人們視自己當時的角色執行該角色應當做的任務以達到合作的目的。在學習方面,我們稍微修改了傳統Q-Learning而發展出適應性Q-Learning,並用一簡單螞蟻實驗來驗證修改後的方法確實較有效率,並且也成功將此方法應用在合作策略的學習上。
The object of this thesis is to develop a self-learning cooperative strategy for robot soccer systems. The strategy enables robots to cooperate and coordinate with each other to achieve the objectives of offense and defense. Through the mechanism of learning, the robots can learn from experiences in either successes or failures, and utilize these experiences to improve the performance gradually. The cooperative strategy is built a hierarchical architecture. The first layer of the structure is responsible for assigning each role, that is, how many defenders and sidekicks should be played according to the positional states. The second layer is for the role assignment related to the decision from the previous layer. We develop two algorithms for assignment of the roles, the attacker, the defenders, and the sidekicks. The last layer is the behavior layer which robots executing their behaviors commands and tasks by their roles. The attacker is responsible for chasing the ball and attacking. The sidekicks are responsible for finding good positions, and the defenders are responsible for defending competitor scoring. The robots’ roles are not fixed. They can dynamically exchange their roles with each other. In the aspect of learning, we develop an Adaptive Q-Learning method which is modified form the traditional Q-Learning. By some experiments, it is more effective than the traditional Q-Learning observed in a simple ant experiment, and it is also applied to the learning of the cooperative strategy successfully.
Table of Contents
Chapter 1 Introduction .................................1
1.1 Motivation .....................................1
1.2 Reinforcement Learning .........................1
1.2.1 Markov Decision Processes (MDP) ..........2
1.2.2 Temporal Difference (TD) Learning ........5
1.2.3 SARSA ....................................6
1.2.4 Q-Learning ...............................7
1.3 Fuzzy Logic Controller .........................8
1.3.1 Fuzzification Layer ......................8
1.3.2 Fuzzy Rule Base ..........................9
1.3.3 Inference Engine .........................10
1.3.4 Defuzzification Layer ....................11
1.4 Thesis Organization ............................12
Chapter 2 Background of Robot Soccer ...................13
2.1 Robot Soccer Game ..............................13
2.2 Multi-Agent Systems Overview ...................16
Chapter 3 Improvement of Q-Learning ....................19
3.1 Introduction ...................................19
3.2 Adaptive Q-Learning ............................20
3.3 Simulation (Artificial Ant Problem) ............23
3.4 Concluding Remarks .............................25
Chapter 4 Cooperative Strategy Design ..................26
4.1 Introduction ...................................26
4.2 Strategy Architecture ..........................27
4.3 Strategy Selection Design ......................29
4.4 Role Assignment ................................32
4.5 Behaviors Design ...............................34
4.5.1 Common Behavior (Obstacle Avoidance) .....34
4.5.2 Attacker Behavior ........................36
4.5.3 Defender Behavior ........................37
4.5.4 Sidekick Behavior ........................39
Chapter 5 Experiments and Discussion ...................44
5.1 Introduction ...................................44
5.2 Results ........................................45
Chapter 6 Conclusions and Future Work ..................49
6.1 Conclusions ....................................49
6.2 Future Work ....................................50
References .............................................51
List of Figures
Figure 1.1 Reinforcement Learning ....................2
Figure 1.2 MDP .......................................3
Figure 1.3 Simplest TD Method ........................5
Figure 1.4 Fuzzy Logic Controller ....................9
Figure 2.1 Robot soccer system configuration .........14
Figure 2.2 5-vs-5 simulator picture ..................15
Figure 2.3 11-vs-11 simulator picture ................15
Figure 3.1 The Adaptive Q-learning ...................22
Figure 3.2 10×10 lattices ............................23
Figure 3.3 Simulation result of 10×10 ................24
Figure 3.4 Simulation result of 20×20 ................25
Figure 4.1 Cooperative Strategy Architecture .........27
Figure 4.2 Cooperative Strategy Flow Chart ...........28
Figure 4.3 Membership Functions in IF Part ...........30
Figure 4.4 Actions of Strategy Selection .............31
Figure 4.5 Role_Arbiter1 .............................32
Figure 4.6 Role_Arbiter2 .............................34
Figure 4.7 Obstacles-Avoidance .......................35
Figure 4.8 Computing the join vector .................35
Figure 4.9 Attacker Behavior .........................36
Figure 4.10 Defender Behavior .........................37
Figure 4.11 Defending Position of the Defender ........38
Figure 4.12 States of Field ...........................39
Figure 4.13 Actions for one sidekick ..................40
Figure 4.14 Actions for two sidekicks .................41
Figure 4.15 Actions for three sidekicks ...............41
Figure 4.16 Membership functions in IF Part ...........42
Figure 5.1 The scores of our robot team ..............45
Figure 5.2 The scores of the opponent team ...........46
Figure 5.3 The score differences between our scores
and the opponents'' scores ............... 46
Figure 5.4 The final analysis result .................48
[1] Maeda, Y.;” Fuzzy adaptive Q-learning method with dynamic learning parameters” IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th , 25-28 July 2001. Page(s): 2778 -2780 vol.5
[2] Durfee, E.H., Lesser, V.R. and Corkill, D.D. Trends in Cooperative Distributed Problem Solving. In: IEEE Transactions on Knowledge and Data Engineering, March 1989, KDE-1(1), pages 63-83.
[3] Jennings, N.R., Sycara, K. and Wooldridge, M. A Roadmap of Agent Research and Development. In: Autonomous Agents and Multi-Agent Systems Journal, N.R. Jennings, K. Sycara and M. Georgeff (Eds.), Kluwer Academic Publishers, Boston, 1998, Volume 1, Issue 1, pages 7-38.
[4] Jennings, N.R., Sycara, K. and Wooldridge, M. A Roadmap of Agent Research and Development. In: Autonomous Agents and Multi-Agent Systems Journal, N.R. Jennings, K. Sycara and M. Georgeff (Eds.), Kluwer Academic Publishers, Boston, 1998, Volume 1, Issue 1, pages 7-38.
[5] Kostiadis, K.; Huosheng Hu; “Reinforcement learning and co-operation in a simulated multi-agent system” Intelligent Robots and Systems, 1999. IROS ''99. Proceedings. 1999 IEEE/RSJ International Conference on , Volume: 2 , 17-21 Oct. 1999 Page(s): 990 -995 vol.2
[6] El-Telbany, M.E.; Abdel-Wahab, A.H.; Shaheen, S.I.; “Learning spatial and expertise distribution coordination in multiagent systems” Circuits and Systems, 2001. MWSCAS 2001. Proceedings of the 44th IEEE 2001 Midwest Symposium on , Volume: 2 , 14-17 Aug. 2001 Page(s): 636 -640 vol.2
[7] Decker, K., Sycara, K. and Williamson, M. Middle-Agents for the Internet. In: Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI-97), January, 1997.
[8] Bhanu, B.; Leang, P.; Cowden, C.; Lin, Y.; Patterson, M.; “Real-time robot learning” Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on , Volume: 1 , 21-26 May 2001 Page(s): 491 -498 vol.1
[9] Nwana, H.S. Software Agents: An Overview. In: The Knowledge Engineering Review, October/November 1996, Volume 11, Number 3, pages 205-244.
[10] Wooldridge, M.J. and Jennings, N.R. Pitfalls of Agent-Oriented Development. In: Proceedings of the Second International Conference on Autonomous Agents (Agents ''98), K. P. Sycara and M. Wooldridge (Eds.), ACM Press, May 1998.
[11] Virdhagriswaran, S., Osisek, D. and O''Connor, P. Standardizing Agent Technology. In: ACM StandardView, 1995, Volume 3, Number 3, pages 96-101.
[12] Eyal Even-Dar, Yishay Mansour. “Learning rates for Q-Learning” 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001, Amsterdam, The Netherlands, July 2001, Proceedings
[13] Eyal Even-Dar, Yishay Mansour. “Convergence of Optimistic and Incremental Q-Learning (2001)” in http://citeseer.nj.nec.com/even-dar01convergence.html
[14] Luiz Chaimowicz1, Mario F. M. Campos1 and Vijay Kumar2 “Dynamic Role Assignment for Cooperative Robots” in
http://www.verlab.dcc.ufmg.br/projetos/chaimo/artigos/icra2002a.pdf
[15] Sng, H.L.; Sen Gupta, G.; Messom, C.H.; “Strategy for collaboration in robot soccer” Electronic Design, Test and Applications, 2002. Proceedings. The First IEEE International Workshop on , 29-31 Jan. 2002 Page(s): 347 —351
[16] Jong-Hwan Kim; Kwang-Choon Kim; Dong-Han Kim; Yong-Jae Kim; Vadakkepat, P.; “Path planning and role selection mechanism for soccer robots” Robotics and Automation, 1998. Proceedings. 1998 IEEE International Conference on , Volume: 4 , 16-20 May 1998 Page(s): 3216 -3221 vol.4
[17] Thilo Weigel, Jens-Steffen Gutmann, Markus Dietl, Alexander Kleiner, Bernhard Nebel; “CS Freiburg: Coordinating Robots for Successful Soccer Playing” in
http://www.informatik.uni-stuttgart.de/ipvr/bv/RoboCupCamp2002/
Handouts/Nebel/nebelPaper.pdf
[18] Ashley Tews & Gordon Wyeth “Using Centralised Control and Potential Fields for Multi-robot Cooperation in Robotic Soccer” in
http://www-robotics.usc.edu/~atews/uq_webpage/prima98.pdf
[19] Maeda, Y.; “Modified Q-learning method with fuzzy state division and adaptive rewards” Fuzzy Systems, 2002. FUZZ-IEEE''02. Proceedings of the 2002 IEEE International Conference on , Volume: 2 , 12-17 May 2002 Page(s): 1556 —1561
[20] Dadios, E.P.; Maravillas, O.A., Jr.;” Fuzzy logic controller for micro-robot soccer game” Industrial Electronics Society, 2001. IECON ''01. The 27th Annual Conference of the IEEE , Volume: 3 , 29 Nov.-2 Dec. 2001 Page(s): 2154 -2159 vol.3
[21] Ahmadabadi, M.N.; Asadpour, M.;” Expertness based cooperative Q-learning” Systems, Man and Cybernetics, Part B, IEEE Transactions on , Volume: 32 Issue: 1 , Feb. 2002 Page(s): 66 —76
[22] Ruoying Sun; Tatsumi, S.; Gang Zhao;” Multiagent reinforcement learning method with an improved ant colony system” Systems, Man, and Cybernetics, 2001 IEEE International Conference on , Volume: 3 , 7-10 Oct. 2001 Page(s): 1612 -1617 vol.3
[23] Arseneau, S., Sun, W., Zhao, C., and Cooperstock, J.R. (2000) “Inter-layer Learning Towards Emergent Cooperative Behavior.” 17th Annual Conference on Artificial Intelligence, American Association of Artificial Intelligence, Austin, Texas, pp.3-8
[24] Alexander Kleiner, Markus Dietl, Bernhard Nebel “Towards a Life-Long Learning Soccer Agent” in
http://www.informatik.uni-freiburg.de/~kleiner/papers/
kleiner-et-al-robocup2002.pdf
[25] Peter Stone; Manuela Veloso; “A Layered Approach to Learning Client Behaviors in the RoboCup Soccer Server” in http://www.cs.cmu.edu/~mmv/papers/AAI98.pdf
[26] Peter Stone, Manuela M. Veloso, and Sorin Achim. “Collaboration and learning in robotic soccer.” In Proceedings of the MicroRobotWorld Cup Soccer Tournament, Taejon, Korea, November 1996. IEEE Robotics and Automation Society.
[27] Poj Tangamchit; John M. Dolan; Pradeep K. Khosla; “The Necessity of Average Rewards in Cooperative Multirobot Learning” Proceedings of 2002 IEEE International Conference on Robotics & Automation, Washington, DC May 2002
[28] Kuk-Hyun Han, Kang-Hee Lee, Choon-Kyoung Moon, Hoon-Bong Lee, and Jong-Hwan Kim; “Robot Soccer System of SOTY 5 for Middle League MiroSot” in http://www.khhan.com/pdf/Soty5FIRA.pdf
[29] Richard S. Sutton and Andrew G. Barto “Reinforcement Learning An Introduction” A Bradford Book, The MIT Press, Cambridge, Massachusetts London, England
[30] www.fira.net
[31] http://www.informatik.unifreiburg.de/~robocup/publications.htm
[32] http://www.stanford.edu/~vengerov/papers.html
[33] http://www.ai.univie.ac.at/~brian/pthesis/pthesis_html/node95.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top