跳到主要內容

臺灣博碩士論文加值系統

(44.221.70.232) 您好!臺灣時間:2024/05/29 05:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:徐政偉
研究生(外文):Cheng-Wei Hsu
論文名稱:基於加強式學習建構機器人行為融合之實現
論文名稱(外文):An Implementation of Behavior Fusion Approach for Mobile Robots Based on Q-Learning
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Kao-Shing Hwang
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:97
語文別:中文
論文頁數:65
中文關鍵詞:加強式學習
外文關鍵詞:reinforcement learning
相關次數:
  • 被引用被引用:1
  • 點閱點閱:632
  • 評分評分:
  • 下載下載:39
  • 收藏至我的研究室書目清單書目收藏:2
在加強式學習(reinforcement learning)問題中,定義適當的回饋值(reward)和狀態空間(state space)使機器人能夠學到我們期望的行為並不是簡單的一件事。在這篇論文中,我們先示範欲讓機器人學習的動作來教機器人,而後用以加強式學習模式建構決策樹(RL-based decision tree)去找出示範動作輸入和輸出的規則。以加強式學習模式建構決策樹在每一節點決定該採取怎樣的試驗是根據長遠的評估值,並非傳統決策樹學習(decision tree learning)是採取由上而下貪婪的策略。
本論文中使用融合行為學習演算法Fusion Approach Based on Q-Learning,仍然保持著個別行為學習的設計動作簡單特性.這個方法同時能更進一步地調整多個行為的動作,並且針對整體的行為做最佳化。設計者只需針對多個單一行為各自做操控示範,分別讓機器人學會行為的組合,然後直觀地設計整體行為的回饋值,利用FBQL系統來將多個單一行為的動作以權重值W值來做融合,並由學習法則來不斷地調整權重值W值,找出各行為間的最恰當比重。
在本實驗,我們建構了一台行動機器人,該行動機器人主要是以嵌入式發展板為主要系統,來呈現機器人複雜行為的能力。
It is hard to define a state space or the proper reward function in reinforcement learning to make the robot act as expected. In this thesis, we demonstrate the expected behavior for a robot. Then a Reinforcement learning-based decision tree approach is proposed to decide to split according to long–term evaluations, instead of a top-down greedy strategy which finds out the relationship between the input and output from the demonstration data In this thesis, we use a new learning algorithm, Fused Behavior Q-Learning Algorithm (FBQL) to keep the merit of simplicity in designing individual behavior learned. The method further tune up multiple behaviors at the same time on improving actions with responding reinforcement signals. Initially, robots imitate a behavior individually, and then combine learned behaviors by a set of appropriate weighting parameters learned by a Q-learning method so as that the robots can behave adaptively and optimally in a dynamic environment. In the experiments, we build a mobile robot, powered by an embedded board , to demonstrate the capability of a complicated behavioral robot.
目錄
圖目錄 VI
表目錄 VIII
第1章 導論 1
1.1 研究動機與目的 1
1.2 論文架構 2
第2章 背景介紹 3
2.1 Reinforcement Learning 3
2.2 Q-Learning 5
2.3 Decision Tree Induction 6
2.4 RL-based Decision Tree 9
2.4.1 RL-based Decision Tree架構 9
2.4.2 狀態表示法 12
2.4.3 切割狀態 13
2.4.4 回饋值 13
2.4.5 DT-based Q-Learning 14
2.4.6 研究成果 16
第3章 系統架構 20
3.1 整體架構 20
3.2 準系統 22
3.3 全方位視覺系統 23
3.4 KS-44B0嵌入式發展板 25
3.5 高必v馬達驅動器 26
3.6 RS232軟/硬體 28
第4章 融合行為學習演算法(FBQL) 29
4.1 融合行為學習演算法(FBQL)架構 29
4.2 單一行為的模仿 32
4.3 狀態表示法 32
4.4 融合行為的輸出動作 33
4.5 回饋值r值 34
4.6 評估值Q值更新 35
4.7 權重值W值更新 36
第5章 實驗設計與討論 39
5.1 尋標行為 39
5.2 避障行為 43
5.3 跑位行為 46
5.4 融合行為 49
第6章 結論與未來展望 51
參考文獻 52
參考文獻
[1] C. J. Wu, ” A Behavior Fusion Approach Based on Q-Learning for Mobile robots,” Master thesis, National Chung Cheng University, Taiwan, 2007.
[2] T. H. Yang, “Behavior Cloning by RL-based Decision Tree,” Master thesis, National Chung Cheng University, Taiwan, 2006.
[3] T. W. Yang, “Decision Tree Induction Based on Reinforcement Learning Modelling and its Application on State Space Partition,” Master thesis, National Chung Cheng University, Taiwan, 2005.
[4] R. S. Sutton and A. G. Barto, “Reinforcement Learning,” The MIT Press, Cambridge, Massachusetts, London, England, 1998.
[5] M. J. MATARI´C, “Reinforcement Learning in the Multi-Robot Domain ”
Brandeis University, Waltham, MA 022544, 1997.
[6] N. J. Nilsson, “Introduction to Machine Learning,” Robotics Laboratory Department of Computer Science Stanford University, pp.159-174, 1997.
[7] K. Y. T .and M. T. C, “ Formation Control of the Multi-Robot Team’s Behaviors Based on Decentralized Strategies ” National Kaohsiung First University of Science and Technology, 2005.
[8] Z. Liu, M. H., Jr. Ang, and W. K. G. Seah, “Reinforcement Learning of Cooperative Behaviors for Multi-Robot Tracking of Multiple Moving Targets,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.1289-1294, 2005.
[9] J. P. Brusey, “Learning Behaviors for Robor Soccer,” RMIT University, Melbourne, Victoria, Australia, 2002.
[10] L. Hyafil and R. L. Rivest, “Constructing Optimal Binary Decision Trees is NP-Complete,” Information Processing Letter, vol.5, 1976.
[11] M. Dong and R. Kothari, “Look-Ahead Based Fuzzy Decision Tree Induction,” IEEE Trans. Fuzzy Syst, vol.9, 2001.
[12] S. K. Murthy and S. Salzberg, “Lookahead and Pathology in Decision Tree Induction,” in Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, San Mateo, CA, pp.1025-1031, 1995.
[13] S. R. Safavian and D. Landgrebe, “A Survey of Decision Tree Classifier Methodology,” IEEE Transactions on Systems, Man, and Cybernetics, vol.21, pp.660-674, 1990.
[14] C. J. C. H. Watkins, “Learning from Delayed Rewards,” PhD thesis, Cambridge University, 1989.
[15] Y. L. Wang, ” Adaptive Formation of Multiple Robots Based on Behavioral
Learning,” Master thesis, National Chung Cheng University, Taiwan, 2008.
[16] 鍾國亮,“影像處理與電腦視覺”,東華書局,2006。
[17] 準系統SUMICOM S631C,http://www.kingyoung.com.tw/
[18] 凱斯嵌入式板44B0,http://www.kaise.com. tw/bbs/viewforum.php?f=35
[19] (Barr, Michael)/江俊彥/林長毅/Barr Michael “嵌入式系統:使用C/C++” ,歐萊禮發行,1999
[20] 陳俊宏”Embedded Linux嵌入式系統原理與實務”,學貫行銷股份有限公
司,2005。
[21] 吳德銘,”Linux程式設計教學手冊”,
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top