跳到主要內容

臺灣博碩士論文加值系統

(44.200.140.218) 您好!臺灣時間:2024/07/26 00:22
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:梁景舜
研究生(外文):Ching-Shun Liang
論文名稱:基於模型之深度強化學習
論文名稱(外文):Model-Based Deep Reinforcement Learning
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Hwang,Kao-Shing
學位類別:碩士
校院名稱:國立中山大學
系所名稱:電機工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:110
語文別:中文
論文頁數:82
中文關鍵詞:強化學習卷積神經網路自編碼器間接式學習街機學習環境
外文關鍵詞:Reinforcement LearningConvolution Neural NetworkAutoencoderIndirect LearningArcade Learning Environment
相關次數:
  • 被引用被引用:0
  • 點閱點閱:178
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
本論文提出一個由卷積神經網路所構成的模型,並以自動編碼器(Autoencoder)作為環境的特徵擷取器(Feature Extractor),藉由訓練此模型的網路參數,讓其整體近似於環境,這樣一來就能由此模型產生所謂的虛擬經驗(Simulated Experience)。而當環境中的代理人(Agent)利用從環境得來的真實經驗(Real Experience)進行直接學習(Direct Learning)時,模型也同步在進行訓練,待模型訓練完成後,能輸出額外的虛擬經驗另外讓代理人進行間接學習(Indirect Learning),以增進代理人在環境中的表現。在虛擬經驗的產生上,提出了向前預測與反向預測兩種不同概念的方法,也將此兩種預測方法同時結合了廣度搜索與深度搜索的想法,為本論文產生虛擬經驗的四大方式。

論文所採用的實驗環境為雅達利遊戲(Atari Games)的街機學習環境(Arcade Learning Environment, ALE),此學習環境最早是應用於深度Q學習(Deep Q Learning)的論文實驗當中,而本論文所提出的架構與方法,在此環境中的表現大部分都優於深度Q學習的方法,另外在部分可觀測(Partially Observable Markov Decision Process, POMDP)的環境中也讓代理人的學習有所更好的成效,證明模型額外輸出的虛擬經驗是有助於環境中代理人的學習。
This thesis proposes a model which is composed of convolutional neural network, and uses autoencoder as feature extractor of the environment. We train this model which can be similar to the environment, so that this model can produce simulated experience. When the agent uses the real experience obtained from the environment for direct learning, the model is also training simultaneously. After the model’s training is completed, it can output additional simulated experience different from that given by the environment to improve the agent''s performance in the environment. To generate simulated experience, we propose two different methods including forward prediction and backward prediction, which are also combined with the idea of breadth search and depth search.
The experimental environment of this thesis is the Arcade Learning Environment (ALE) of Atari games. This environment was first be used in the paper of Deep Q Learning. Most of the methods proposed in this thesis perform better than the Deep Q Learning method in this environment. In addition, the agent with indirect learning has better performance in Partially Observable Markov Decision Process(POMDP) environments, which proves that simulated experience is helpful for the learning of the agent in the environment.
論文審定書 i
誌謝 iii
中文摘要 iv
ABSTRACT v
目錄 vi
圖目錄 ix
表目錄 xii
第1章 緒論 1
1.1 研究動機 1
1.2 文獻回顧 2
1.3 論文架構 3
第2章 研究背景 4
2.1 卷積神經網路(Convolution Neural Network) 4
2.1.1 卷積層(Convolution Layer) 5
2.1.2 池化層(Pooling Layer) 6
2.1.3 全連接層(Fully Connected Layer) 7
2.2 自編碼器(Autoencoder) 8
2.3 強化學習(Reinforcement Learning) 9
2.3.1 策略與價值函數(Policy and Value Function) 10
2.3.2 Q學習(Q-learning) 11
2.3.3 深度Q學習(Deep Q-learning) 12
2.3.4 Dyna 架構(Dyna Structure) 13
第3章 研究方法 15
3.1 模擬環境網路模型(Environment Simulation Network) 15
3.1.1 模擬環境網路模型設計 15
3.1.2 模擬環境網路模型訓練方式 18
3.2 直接學習(Direct Learning) 20
3.2.1 直接學習(Direct Learning)方法設計與流程 21
3.3 間接學習(Indirect Learning) 23
3.3.1 間接學習(Indirect Learning)方法設計 23
3.4 向前預測模型(Forward Prediction Model) 24
3.4.1 向前廣度預測(Forward Breadth Prediction)方法 24
3.4.2 向前深度預測(Forward Depth Prediction)方法 26
3.5 反向預測模型(Backward Prediction Model) 28
3.5.1 反向廣度預測(Backward Breadth Prediction)方法 28
3.5.2 反向深度預測(Backward Depth Prediction)方法 30
第4章 模擬實驗 32
4.1 實驗軟硬體版本 32
4.2 實驗總體設計 33
4.2.1 模擬環境介紹 33
4.2.2 模擬實驗流程 36
4.3 實驗結果與分析 40
4.3.1 乒乓球(Pong)遊戲實驗 40
4.3.2 太空侵略者(Space Invaders)遊戲實驗 48
4.3.3 Beam Rider遊戲實驗 56
4.3.4 POMDP環境實驗 64
第5章 結論與未來展望 67
5.1 結論 67
5.2 未來展望 67
參考文獻 68
[1]Richard S Sutton, " Dyna, an integrated architecture for learning, planning,and reacting," ACM Sigart Bulletin, 2(4):160–163, 1991.
[2]Marlos C Machado, Marc G Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling, "Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents," Journal of Artificial Intelligence Research, 61:523–562, 2018.
[3]V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, 518(7540): 529–533, 2015.
[4]Alekh Agarwal, Sham Kakade, and Lin F Yang, "Model-based reinforcement learning with a generative model is minimax optimal." In Conference on Learning Theory, pages 67–83, 2020.
[5]Chelsea Finn and Sergey Levine, "Deep visual foresight for planning robot motion," In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 2786–2793. IEEE, 2017.
[6]Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, and Sergey Levine, "Solar: Deep structured representations for model-based reinforcement learning," In International Conference on Machine Learning, pages 7444–7453, 2019.
[7]Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine, "When to trust your model: Model-based policy optimization," In Advances in Neural Information Processing Systems, pages 12519–12530, 2019.
[8]Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh, "Action-conditional video prediction using deep networks in atari games," In Advances in neural information processing systems, pages 2863–2871, 2015.
[9]Felix Leibfried, Nate Kushman, and Katja Hofmann, "A deep learning approach for joint video frame and reward prediction in atari games," arXiv preprint arXiv:1611.07078, 2016.
[10]David Ha and J¨urgen Schmidhuber, "Recurrent world models facilitate policy evolution," In Advances in Neural Information Processing Systems, pages 2450–2462, 2018.
[11]G Zacharias Holland, Erin J Talvitie, and Michael Bowling, "The effect of planning shape on dyna-style planning in high-dimensional state spaces," arXiv, pages arXiv–1806, 2018.
[12]V. H. Phung and E. J. Rhee, "A High-Accuracy Model Average Ensemble of Convolutional Neural Networks for Classification of Cloud Image Patches on Small Datasets," Applied Sciences, vol. 9, no. 21, p. 4500, 2019.
[13]R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," MIT press, 2018.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊