跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.169) 您好!臺灣時間:2025/01/25 08:04
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林咩
研究生(外文):Yu-Hong Lin
論文名稱:具連續行為的Q-learning應用於多重代理人之合作
論文名稱(外文):Q-learning with Continuous Action Value in Multi-agent Cooperation
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Kao-Shing Hwang
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
畢業學年度:94
語文別:英文
論文頁數:46
中文關鍵詞:Q-學習隨機式實數單元加強式學習
外文關鍵詞:reinfoecement learningQ-learningStochastic Real-Valued Unit(SRV)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:488
  • 評分評分:
  • 下載下載:66
  • 收藏至我的研究室書目清單書目收藏:0
在本篇論文中,我們提出了一個可以產生連續性輸出行為的Q-學習(Q-Learning)演算法,並且將此演算法應用於多重代理人(Multi-agent)的系統中。環境架構則是以兩個獨立動作的機器人,中間連結一根桿子,通過並躲避環境之中的障礙物,達到目的地來實做此演算法。傳統的Q學習由於必須具備事先定義好的離散狀態空間(state)空間集合及輸出動作(action)空間集合,所以是有限的狀態空間集合,但是在真實的環境裡面狀態和行為的空間集合是連續的,因此離散的狀態空間在表達實際真實世界時便無法相當精確地呈現機器人落於同一狀態空間上的差異性。在此我們使用一種類似隨機式實數學習(Stochastic Real-Valued learning Unit)的學習概念,加入原本的Q學習之中,使最後輸出的行為是連續的。不僅使得模擬的結果更貼近真實世界的運行,另外也可以修正原始的Q學習定義的狀態空間,使輸出結果在真實世界的應用中能有更高更好的效率。
In this thesis, we propose a Q-learning with continuous action space and extend this algorithm to a multi-agent system. We implement this algorithm in a task that there are two robots taking action independently and both are connected with a straight bar. They must cooperate to move to the goal and avoid the obstacles in the environment. Conventional Q-learning needs a pre-defined and discrete state space, so it will have finite states and actions. But it is not practical because in real world the states of the environment and the actions are both continuous, so when we using Q-learning to demonstrate the action in the world, we can’t precisely identify the variances of the different situation in the same state. We use the concept of SRV (Stochastic Real-Valued Unit) to train the action in each state, so the result action will be continuous. It will make the simulation that more close to the real world; also it can fix the pre-defined action space in Q-learning, result in a more ideal learning outcome.
CHAPTER 1 INTRODUCTION 1
1.1 System Architecture 2
1.2 Chapter Organization 3
CHAPTER 2 BACKGROUND KNOWLEDGE 4
2.1 Reinforcement Learning 4
2.2 Q-learning 6
2.3 Stochastic Reinforcement Learning 8
CHAPTER 3 THE CONTINUOUS REAL-VALUED SEARCHING NETWORK FOR Q-LEARNING 15
3.1 The searching efficiency of SRV 16
3.2 Stochastic Recording Real-Valued unit 20
3.3 Output State Representation 24
3.4 Q-learning with Stochastic Recording Real-Valued Unit 27
CHAPTER 4 SIMULATIONS 32
4.1 The Environment of the Simulation 32
4.2 The Simulation Result 34
4.2.1 Q-learning in Single-Agent 34
4.2.2 Q-learning with the Stochastic Recording Real-Valued unit in single-agent 35
4.2.2 Q-learning with the Stochastic Recording Real-Valued unit in multi-agent 39
CHAPTER 5 CONCLUSION AND FUTURE WORK 42
REFERENCES 44
Leslie Kaebling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning:A survey. Journal of Artificial Intelligence Research, 4:237-285, May 1996.
[2]Richard S. Sutton and Andrew G. Barto. Reinforcement Learning:An Introduction. MIT Press/Bradford Books, March 1998.
[3]C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, Psychology Departments, Cambridge University, 1989.
[4]C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279-292, 1992.
[5]Takahashi, Y., Takeda, M., and Asada, M. (1999). “Continuous Valued Q-learning for Vision-Guided Behavior Acquistion”, Proceedings of 1999 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Intergration for Intelligent Systems, pp.255-260, 1999.
[6]V. Gullapali, “A stochastic reinforcement learning algorithm for learning real-valued functions,” Neural Net., Vol. 3, pp. 671-692, 1990.
[7]V. Gullapalli, “Associative reinforcement learning of real-valued functions,” Proc. IEEE, Syst., Man, Cybern., Charlottesville, VA, Oct. 1991.
[8]A.G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE Trans. Syst., Man, Cybern., Vol. SMC-13, pp. 834-846.1983
[9]Athanasios V. Vasilakos , Nikolaos H. Loukas , Konstantinos C. Zikidis, “A.N.A.S.A.II:A NOVEL, REAL-VALUED, REINFORCEMENT ALGORITHM FOR NEURAL UNIT / NETWORK.” Proceedings of 1993 International Joint Conference on Neural Networks.
[10]Aly El-Osery and Mo Jamshidi, “A Stochastic Learning Automaton Based Autonomous Control Robotic Agents.” Autonomous Control Engineering Center (ACE), University of New Mexico.
[11]Krzysztof Patan and Thomas Parisini, “Stochastic learning methods for dynamic neural networks:simulated and real-data comparisons.” Proceedings of the American Control Conference Anchorage, AK May 8-10, 2002.
[12]Masayuki Yamamura, Takashi Onozuka, “Reinforcement Learning with Knowledge by using a Stochastic Gradient Method on a Bayesian Network.” 0-7803-4859-1/98, 1998 IEEE.
[13]V. Paraskevopoulos, M.I.Heywood, C.R.Chatwin, “Modular SRV Reinforcement Learning:An architecture for non-linear control.” 0-7803-4859-1/98 1998 IEEE.
[14]Sadayoshi MIKAMI, and Yukinori KAKAZU, “Extended Stochastic Reinforcement Learning for the Acquisition of Cooperative Motion Plans for Dynamically Constrained Agents.” Manuscript received July 15, 1993.
[15]J.-S. R. Jang, C.-T. Sun, E. Mizutzni, “Neural-Fuzzy AND Soft Computing” Prentice Hall Upper Saddle River, NJ 07458, 1997.
[16]Georgios I. Papadimitriou, “A New Approach to the Design of Reinforcement Schemes for Learning Automata: Stochastic Estimator Learning Algorithms.” IEEE transaction on knowledge and data engineering, Vol. 6, No. 4, August 1994.
[17]R.A. Leaver and P. Mars, “STOCHASTIC COMPUTING AND REINFORCEMENT NEURAL NETWORKS.” British Aerospace plc, U.K., University of Durham, U.K..
[18] Tapas K. Das, Abhijit Gosavi, Sridhar Mahadevan, Nicholas Marchalleck, “Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning.” Management Science/Vol. 45, No. 4, April 1999
[19]Nils J. Nilsson, “Introduction to Machine Learning” Artificial Intelligence Laboratory, Department of Computer Science, Stanford University Stanford, CA 94305.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關論文
 
1. 40. 楊國德(1998).電腦網路的終身學習環境.教學科技與媒體,41,23-31。
2. 33. 陳鉉文、白宏堯(2000).學院職員、技士及行政助理電腦技能與需求調查之個案研究.建國學報,19,755-764。
3. 32. 郭春花(1992).臨床護理人力之培育.護理新象,2(1),7-21。
4. 25. 吳英隆、戴坤星、邱俊福(2003).導入CASE工具與先前採用方法論相容性之研究.瞭解相關背景因素之中介角色.資訊管理學報,10(1),1-22。
5. 24. 吳美惠(1993).在職成人電腦態度的特徵與成因之分析.社教雙月刊,55,48-52。
6. 23. 吳德邦、馬秀蘭、徐志誠(1998).台灣中部地區幼稚園教師對電腦經驗與態度之研究.幼兒教育年刊,10,53-79。
7. 22. 吳裕益(2000).教學評量的新趨勢.教育研究,70,6-9。
8. 21. 吳德邦、馬秀蘭、徐志誠(1998).台灣中部地區幼稚園教師對電腦經驗與態度之研究.幼兒教育年刊,10,53-79。
9. 19. 吳美美(1997).課程改革和資訊素養教育.社教雙月刊,74,32-39。
10. 18. 林一熒、李順安(2000).醫院擴張經營的財務歸劃模式.醫院,33(1),1。
11. 16. 林紀慧(1999).網際網路與護理教育.領導護理,3(2),62-66。
12. 9. 何榮桂(1997).從測驗電腦化與電腦化測驗再看網路化測驗.測驗與輔導,144,2972-2974。
13. 8. 何榮桂(2000).網路環境題庫與測驗之整合系統.科學發展月刊,28(7),534-540。
14. 6. 汪蘋、洪世欣、謝佑珊、尹祚芊(2003) .醫學中心基層護理人員臨床專業能力進階制度實施經驗分享.護理雜誌,50(2),30-6。