跳到主要內容

臺灣博碩士論文加值系統

(34.204.169.230) 您好!臺灣時間:2024/02/21 23:12
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林佳儒
研究生(外文):Chia-Ju Lin
論文名稱:以部分分享策略之方法實現多隻代理人合作的問題
論文名稱(外文):Cooperation between Multiple Agents Based on Partially Sharing Policy
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Kao-Shing Hwang
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:43
中文關鍵詞:合作加強式學習分享
外文關鍵詞:cooperationreinforcement learningWoLFsharing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:782
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在人類的社會中,學習是基本的理性行為。人們與同伴或老師交換訊息和知識而不是自己從頭開始學習每樣東西。當任務太複雜以致於一個人無法完成時,他或她可能與同伴彼此合作完成這個任務。合作的行為一樣存在其他的種類,例如螞蟻會交換食物的位置並且合力搬運食物到他們的巢穴。使用其他代理人的經驗和知識,學習的代理人可以學的更快,犯更少的錯誤,並且針對未拜訪過的環境創造某些規則適應它。在一個強調理性和收歛的演算法,WoLF-PHC,學習者允鳥A應他的同伴藉由放慢學習步調,當他表現比較好時,但是加快學習步調當他表現不好時。在本論文中,我們採取合作的特性:分享,然後延伸WoLF-PHC,發展我們的演算法。
In human society, learning is an essential to intelligent behavior. However, people do not need to learn everything from scratch by their own discovery. Instead, they exchange information and knowledge with the others and learn from their peers and teachers. When a task is too complex for an individual to handle, one may cooperate with its partners in order to accomplish the task. Like human society, cooperation exists in the other species, such as ants are known to communicate about the locations of food and move it cooperatively. Using other agents’ experiences and knowledge, a learning agent may learn faster, make fewer mistakes, and create rules for unstructured situations. In a rational and convergent learning algorithm, WoLF-PHC, an agent is allowed to adapt to its peers by learning slowly when the learner is “winning”, but learn fast when it is not “winning”. We retain these two properties and make use of them. In this thesis, we adopt the cooperation properties among animals, sharing information, and develop our algorithm based on WoLF-PHC.
Chapter 1. Introduction 1
1.1. Chapter organization 2
Chapter 2. Background 3
2.1. Game Theory 3
2.2. Markov Decision Process (MDP) 4
2.3. Matrix Games 5
2.4. Stochastic Game 6
2.5. Nash Equilibrium 7
Chapter 3. Preliminary 8
3.1. Reinforcement Learning 8
3.1.1. Q-learning: An Off-Policy TD Control 9
3.1.2. Multi-agent learning 10
3.2. Policy Iteration Framework 10
3.3. Fictitious play 11
Chapter 4. Algorithms 12
4.1. Policy Hill Climbing (PHC) Algorithm 12
4.2. Win or Learn Fast Policy Hill Climbing (WoLF-PHC) Algorithm 13
4.2.1. Properties [4] 13
4.2.2. WoLF-PHC Algorithm 13
Chapter 5. Weighted Strategy Sharing 16
Chapter 6. Cooperation between multiple agents based on partially sharing policy (WoLF-PSP) 18
6.1. WoLF-PSP 20
6.2. Deadlock Detection and resolution 23
Chapter 7. Simulation 25
7.1. Task description 25
7.2. Cooperation of moving an object to the goal between two agents 25
7.2.1. Results 27
7.3. Cooperation between two agents of passing through a narrow gate 32
7.3.1. Task description 32
7.3.2. Simulation results 33
7.4. Cooperation between three agents 37
7.4.1. Results 38
Chapter 8. Conclusion and Future Work 40
[1]M. Tan, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents.” In Proceedings of the Tenth International Conference on Machine Learning, Amherst, Massachusetts. Morgan Kaufmann, 1993.
[2]B. Banerjee and J. Peng, “Adaptive Policy Gradient in Multiagent Learning.“ In Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM Press, pp. 686–692, 2003.
[3]M. N. Ahmadabadi and M. Asadpour, “Expertness Based Cooperative Q-Learning.” IEEE Transactions on systems, Man and Cybernetics-Part B: Cybernetics, vol. 32, no. 1, pp. 66-76, February 2002.
[4]M.Bowling and M. Veloso, “Rational and Convergent Learning in Stochastic Games.” In Proceedings of the Seventeenth International Joint Conference on Articial Intelligence, Seattle, WA, 2001, pp. 1021-1026.
[5]M Bowling and M Veloso, “Multiagent learning using a variable learning rate.” Artificial Intelligence, vol. 136, pp. 215-250, 2002.
[6]M. Bowling, “Multiagent learning in the presence of agents with limitations.” Ph.D. dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, May 2003, CMU-CS-03-118.
[7]G. W. Brown, “Some notes on computation of games solutions.” Rand report P-78. The RAND Corporation, Santa Monica, California. 1949.
[8]J. Robinson, ”An iterative method of solving a game.” Annals of Mathematics, 54, 296–301, 1951.
[9]O. J. Vrieze, “Stochastic Games with Finite State and Action Spaces.” No. 33 in CWI Tracts.Centrum voor Wiskunde en Informatica. 1987.
[10]E. Yang and D. Gu, “Multiagent reinforcement learning for multi-robot systems: A survey.” Technical Report CSM-404. University of Essex, 2004.
[11]R. Hafner and M. Riedmiller, “Reinforcement learning on an omnidirectional mobile robot.” In Proceeding of 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), Las Vegas, 2003.
[12]S. Kato and J. Takeno, “Fundamental studies on the application of traffic rules to the mobile robot world: proposal and feasibility study of the traffic rules application system (TRAS).” Advanced Robotics, pp. 1063-1068, vol.2, 1991.
[13]Tapas K. Das, Abhijit Gosavi, Sridhar Mahadevan, and Nicholas Marchalleck, “Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning” Management Science, Vol. 45, No. 4 (Apr., 1999), pp. 560-574
[14]K. S. Hwang, Y. J. Chen and T. F. Lin, ” Q-Learning with FCMAC in Multi-agent Cooperation,” Lecture Notes in Computer Science, Vol. 3971, pp. 599-pp.602 2006.
[15]A. Bonarini, A. Lazaric, Enrique Munoz de Cote, and M. Restelli. “Improving cooperation among self-interested reinforcement learning agents.” In ECML workshop on Reinforcement learning in non-stationary environments, Porto, Portugal, October 2005.
[16]http://rossum.sourceforge.net/papers/DiffSteer/#d1
[17]Z. Wang, Y. Hirata, and K. Kosuge, “Dynamic Object Closure by Multiple Mobile Robots and Random Caging Formation Testing.” In Proceeding of IEEE Transactions on systems, Man and Cybernetics Conference (SMC 2006), Taiwan, 2006.
[18]E. Monacelli, C. Riman, R. Thieffry, I. Mougharbel, and S. Delaplace, “A Reactive Assistive Role Switching For Interaction Management in Cooperative Tasks.” In Proceeding of IEEE Transactions on systems, Man and Cybernetics Conference (SMC 2006), Taiwan, 2006.
[19]R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
[20]http://pespmc1.vub.ac.be/ASC/GAME_THEOR.html
[21]X. Meng, R. Babuška, L. Busoniu, Y. Chen, and W. Tan “An Improved Multiagent Reinforcement Learning Algorithm.” In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top