跳到主要內容

臺灣博碩士論文加值系統

(3.235.185.78) 您好!臺灣時間:2021/07/27 16:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:洪贊順
研究生(外文):Hung, Tsan-Shun
論文名稱:具混合規劃架構之並行Dyna-Q學習演算法
論文名稱(外文):A Hybrid Planning in Concurrent Dyna-Q Learning for Multi-agent Systems
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Hwang, Kao-Shing
口試委員:黃國勝陳昱仁蔡清池李祖聖
口試委員(外文):Hwang, Kao-ShingChen, Yu-JenTsai, Ching-ChihLi, Tzuu-Hseng
口試日期:2012-07-20
學位類別:碩士
校院名稱:國立中正大學
系所名稱:光機電整合工程研究所
學門:工程學門
學類:機械工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:63
中文關鍵詞:加強式學習UCBDyna-QGPGPU
外文關鍵詞:Reinforcement learningUCBDyna-QGPGPU
相關次數:
  • 被引用被引用:3
  • 點閱點閱:334
  • 評分評分:
  • 下載下載:6
  • 收藏至我的研究室書目清單書目收藏:0
傳統的加強式學習演算法,如 Q-learning,是建立在一個代理人以及沒有模型的狀況下以one step 的方式學習,所以近幾年來紛紛有許多人提出多代理人以及利用模型做重複學習的概念用來解決學習效率低落的問題,如Dyna-Q、multiagent system 等等。
在這篇論文中,我們融合了一些在不同領域上的演算法,應用不同領域的概念在加強式學習上並且配合Dyna-Q 以及multi agent system 等既有的概念再做延伸。
在代理人的探索上加入了UCB 的算法,加強代理人在探索上的效率,縮短建立虛擬環境模型的時間。在Dyna-Q 的虛擬環境模型上,加入了影像處理的概念銳化模型。
我們也提出了一個能夠實行平行運算,加快Dyna-Q 學習的一種針對環境空間平行的規劃算法,並將優先掃除的概念融入其中,進一步的增加規劃的效率,有效的利用運算的資源, 基於以上的演算法延伸以及融合後, 利用GPGPU(General Purpose Computing on Graphics Processing Units)的概念將模擬實作在CUDA(Compute Unified Device Architecture)架構上,並藉由模擬的方式驗證以上所提出的方法對Dyna-Q 的學習速度上的影響。
Traditional reinforcement learning algorithm, such as Q-learning, is based on one agent and one step learning without a model. In recent years, many have proposed the concepts of multi-agents and using a model for retraining to increase learning
efficiency, such as Dyna-Q and multi-agent system.
In this thesis, we integrated several algorithms of different domains, applied concepts from different domains in reinforcement learning, and made extensions in compliance with existing concepts such as Dyna-Q and multi-agent system.
We added UCB algorithm to reinforce exploration efficiency of agents and shorten the time for virtual environment model establishment. For the virtual environment model of Dyna-Q, we added the concept of image processing to sharpened model.
We also proposed a planning algorithm for environmental space paralleling, which can perform parallel computing and accelerate Dyna-Q learning. The concept of prioritized sweeping was integrated to further increase planning efficiency and resource management. After improving and integrating the above algorithms, the concept of GPGPU (General Purpose Computing on Graphics Processing Units) was
used for simulation on CUDA (Compute Unified Device Architecture). The simulation was applied for verifying the impact of the above method on learning speed of Dyna-Q.
中文摘要....................................................................................................................... II
Abstract ........................................................................................................................ III
Table of Contents ......................................................................................................... IV
List of Figures .............................................................................................................. VI
List of Tables ............................................................................................................. VIII
1. Introduction ............................................................................................................ 1
1.1 Preface ........................................................................................................ 1
1.2 Literature Review....................................................................................... 2
1.3 Objectives .................................................................................................. 3
1.4 Structure ..................................................................................................... 4
2. Related Research .................................................................................................... 5
2.1 Reinforcement learning .............................................................................. 5
2.2 Markov Decision Process .......................................................................... 6
2.3 Q-Learning ................................................................................................. 9
2.4 Dyna-Q ..................................................................................................... 11
2.5 UCB ......................................................................................................... 15
2.6 Prioritized Sweeping ................................................................................ 16
3. Algorithm ............................................................................................................. 18
3.1 Dyna-Q algorithm with UCB ................................................................... 18
3.2 model sharpening ..................................................................................... 20
3.3 Adverse Dyna-Q ...................................................................................... 23
3.4 Hybrid planning adverse Dyna-Q with prioritized sweeping .................. 25
3.5 Integration of multi-agent system and central server ............................... 31
4. Simulation ............................................................................................................ 35
4.1 GPGPU(General Purpose Computing on Graphics Processing Units) .... 35
4.2 CUDA(Compute Unified Device Architecture) ....................................... 36
4.3 Environmental design .............................................................................. 40
4.4 Simulation Parameters ............................................................................. 42
4.5 Algorithm Experiment ............................................................................. 43
4.5.1 Comparison of UCB and   greedy ...................................... 43
4.5.2 Effect of UCB with model sharpening ...................................... 45
4.5.3 Adverse Dyna-Q with prioritized sweeping ................................ 47
4.5.4 Hybrid planning adverse Dyna-Q with prioritized sweeping ...... 48
5. Conclusion and Future Perspective ...................................................................... 51
5.1 Conclusion ............................................................................................... 51
5.2 Future Perspective .................................................................................... 51
Reference ..................................................................................................................... 53
VITA ............................................................................................................................ 55
[1]C. J. C. H. Watkins , and P. Dayan, “Technical note: Q-Learning, ” Machine Learning, 8(3-4): pp. 279-292, 1992.
[2]Araujo A. F. R., Braga, A. P. S., “Reward-penalty reinforcement learning scheme for planning and reactive behavior ,” IEEE International Conference on Systems, Man, and Cybernetics, Vol. 2, pp.1485-1490, 1998.
[3]Junfei Qiao, Zhanjun Hou, Xiaogang Ruan, “Q-learning Based on Neural Network in Learning Action Selection of Mobile Robot,” IEEE International Conference on Automation and Logistics, pp. 263 - 267, 2007.
[4]Bing-Qiang Huang, Guang-Yi Cao, Min Guo, “Reinforcement Learning Neural Network to the Problem of Autonomous Mobile Robot Obstacle Avoidance,” International Conference on Machine Learning and Cybernetics, Vol. 1, pp. 85-89, 2005.
[5]Caihong Li, Jingyuan Zhang, Yibin Li, “Application of Artificial Neural Network Based on Q-learning for Mobile Robot Path Planning,” IEEE International Conference on Information Acquisition, pp.978 - 982, 2006.
[6]Minato T., Asada M., “Environmental change adaptation for mobile robot navigation,” IEEE/RSJ International Conference on Intelligent Robots and Systems , Vol. 3, pp.1859 -1864, 1998.
[7]C. F. Touzet, “Q-learning for robot,” in M. A. Arbib, editor, Handbook of Brain Theory and Neural Networks, pp. 934-937, 2003.
[8]R. S. Sutton, “Dyna, an integrated architecture for learning, planning and reacting,” Working Notes of the 1991 AAAI Spring Symposium on Integrated Intelligent Architectures and SIGART Bulletin 2, pp. 160-163, 1991.
[9]Leslie Pack Kaebling, Michael L. Littman, and Andrew W. Moore. “Reinforcement learning:A survey,” Journal of Artificial Intelligence Research 4, pp.237-285, May 1996.
[10]Peng J., Williams R. J., “Efficient learning and planning within the Dyna framework,” IEEE International Conference on Neural Networks, vol. 1, pp168 - 174, 1993.
[11]Hoang-huu VIET, Sang-hyeok AN, Tae-choong CHUNG, “Extended Dyna-Q Algorithm for Path Planning of Mobile Robots,” Journal of Measurement Science and Instrumentation, Vol. 2, No. 3, pp. 283-287, 2011.
[12]R. S. Sutton and A. G. Barto, Reinforcement Learning An Introduction, Cambridge, Mass., MIT Press, 1998.
[13]Tom M. Mitchell, Machine Learning, Chapter 13, MCGRAW HILL Publisher ,
1997.
[14]Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, Simon Colton, “A Survey of Monte Carlo Tree Search Methods,” IEEE Transactions on Computational Intelligence and AI in Games, Vol. 4, pp. 1-43, 2012.
[15]Jason Sanders and Edward Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming, Addison-Wesley, 2010.
[16]張舒、褚豔利、趙開勇、張鈺勃, GPU 高效能運算之CUDA, 碁峰資訊, 2011
[17]NVIDIA CUDA Library Documentation, http://developer.download.nvidia.com/
compute/cuda/4_1/rel/toolkit/docs/online/index.html
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top