跳到主要內容

臺灣博碩士論文加值系統

(44.220.251.236) 您好!臺灣時間:2024/10/11 03:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊璨輝
研究生(外文):Tsan-Hui Yang
論文名稱:以加強式學習決策樹建構機器人行為模仿
論文名稱(外文):Behavior Cloning by RL-based Decision Tree
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Kao-Shing Hwang
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2006
畢業學年度:95
語文別:英文
論文頁數:48
中文關鍵詞:嵌入式系統足球機器人決策樹學習決策樹加強式學習
外文關鍵詞:embedded systemsoccer robotdecision tree learningdecision treereinforcement learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:443
  • 評分評分:
  • 下載下載:120
  • 收藏至我的研究室書目清單書目收藏:3
以加強式學習決策樹建構機器人行為模仿

摘要

在加強式學習 (reinforcement learning)問題中,定義適當的回饋值(reward)和狀態空間(state space)使機器人能夠學到我們期望的行為並不是簡單的一件事。 在這篇論文中,我們先示範欲讓機器人學習的動作來教機器人,而後用以加強式學習模式建構決策樹 (RL-based decision tree)去找出示範動作輸入和輸出的規則。以加強式學習模式建構決策樹在每一節點決定該採取怎樣的試驗是根據長遠的評估值, 並非傳統決策樹學習 (decision tree learning)是採取由上而下貪婪的策略。
我們將這種方法運用在目標追蹤的行為上。且為了提升目標追蹤問題的效率,我們在以加強式學習模式建構決策樹去找出示範動的規則之後加入Q學習 (Q-learning)。而實驗的結果也證明加入Q學習可以快速地提升目標追蹤問題的效率。
爲了以實體實現我們的理論,我們建構了一台行動機器人。該行動機器人主要是以嵌入式發展板為主要系統。在電腦視覺方面則是透過全方位攝影機,讓該行動機器人可以在一定範圍內看到任何角度的物體。在強大的嵌入式系統和電腦視覺系統的支援下,我們讓機器人繼承模擬器上已經學好的行為,並能夠在機器人上繼續執行Q學習以提升效能
It is hard to define a state space or the proper reward function in reinforcement learning to make the robot act as expected. In this paper, we demonstrate the expected behavior for a robot. Then a RL-based decision tree approach which decides to split according to long–term evaluations, instead of a top-down greedy strategy which finds out the relationship between the input and output from the demonstration data.
We use this method to teach a robot for target seeking problem. In order to promote the performance in tackling target seeking problem, we add a Q-learning along with the state space based on RL-based decision tree. The experiment result shows that Q-learning can promote the performance quickly.
For demonstration, we build a mobile robot powered by an embedded board. The robot can detect the ball of the range in any direction with omni-directional vision system. With such powerful embedded computing capability and the efficient machine vision system, the robot can inherit the learned behavior from a simulator which has learned the empirical behavior and continue to learn with Q-learning to improve the performance of target seeking problem.
Chapter 1 Introduction
1.1 Motivation
1.2 Chapter organization
Chapter 2 Background
2.1 Decision Tree Learning
2.1.1 Decision Tree
2.1.2 Decision Tree Learning Induction
2.2 Reinforcement Learning
2.2.1 Reinforcement Learning Induction
2.2.2 Q-Learning Introduction
Chapter 3 RL-based Decision Tree
3.1 RL-based Decision Tree Induction
3.2 RL-based Decision Tree Algorithm
3.3 Target seeking problem with RL-based DT
3.4 DT-based Q-learning
Chapter 4 The Soccer Robot system
4.1 The hardware of the Soccer Robot
4.1.1 QT-2410 Board
4.1.2 Machine Vision
4.1.3 Motor
4.1.4 Power
4.2 The Software of the Soccer Robot
4.2.1 MIZI Linux
4.2.2 The software system of QT2410
4.2.3 Communication and remote subsystem
4.2.4 Machine Vision system
Chapter 5 Experiments
5.1 The result of RL-based DT Experiment
5.2 The Result of DT-based Q-learning Experiment
Chapter 6 Conclusion and Future Work
[1] NILS J. NILSSON, "INTRODUCTION TO MACHINE LEARNING," ROBOTICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE STANFORD UNIVERSITY, CHAPTER 11 1997.
[2] EUGENE A. FEINBERG AND ADAM SHWARTZ, " HANDBOOK OF MARKOV DECISION PROCESSES METHODS AND APPLICATIONS," KLUWER 2002.
[3] DECISION TREES, HTTP://WWW.AAAI.ORG/AITOPICS/HTML/TREES.HTML
[4] TOM M. MITCHELL, "MACHINE LEARNING," CHAPTER3, MCGRAW HILL PUBLISHERS, 1997.
[5] TSUNG-WEN YANG, "DECISION TREE INDUCTION BASED ON REINFORCEMENT LEARNING MODELLING AND ITS APPLICATION ON STATE SPACE PARTITION", MASTER THESIS, NATIONAL CHUNG CHENG UNIVERSITY, TAIWAN, JUL. 2005.
[6] L. HYAFIL AND R. L. RIVEST, "CONSTRUCTING OPTIMAL BINARY DECISION TREES IS NP-COMPLETE," INFORMATION PROCESSING LETTER, VOL. 5, 1976.
[7] WATKINS, C. J. C. H. (1989). LEARNING FROM DELAYED REWARDS. PH.D. THESIS, CAMBRIDEG UNIVERSITY..
[8]S. K. MURTHY, "AUTOMATIC CONSTRUCTION OF DECISION TREES FROM DATA: A MULTI-DISCIPLINARY SURVEY," DATA MINING AND KNOWLEDGE DISCOVERY, VOL. 2, PP. 345-389, 1998..
[9] ANDREW KACHITES MCCALLUM, "REINFORCEMENT LEARNING WITH SELECTIVE PERCEPTION AND HIDDEN STATE," DOCTOR THESIS, ARTIFICIAL, UNIVERSITY OF ROCHESTER NEW YORK, 1995.
[10] ANDERS JONSSON, ANDREW G. BARTO, " AUTOMATED STATE ABSTRACTION FOR OPTIONS USING THE U-TREE ALGORITHM," UNIVERSITY OF MASSACHUSETTS AMHERST, 2000.
[11] 上海勤研電子,QT-2410開發入門手冊,勤研電子,2004.
[12] RUEI-SHIANG HUNG, " DESIGN AND IMPLEMENTATION OF EMBEDDED PWM DRIVING INTELLIGENT AUTONOMOUS CAR", MASTER THESIS, NATIONAL CHUNG CHENG UNIVERSITY, TAIWAN, JUL. 2005.
[13] Monica N. Nicolescu and Maja J Mataric, "Experience-Based Representation Construction from Human and Robot Teachers", In Proceedings of IEEE/RSJ International conference on Intelligent Robots and Systems, Outrigger Wailea Resort, Maui, Hawaii, USA, Oct. 29-Nov.3, 2001.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top