跳到主要內容

臺灣博碩士論文加值系統

(3.231.230.177) 您好!臺灣時間:2021/08/04 11:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:葉耿豪
研究生(外文):Yeh, Keng-Hao
論文名稱:機器人步態學習之研究
論文名稱(外文):The Study on the Learning of Walking Gaits for Biped Robots
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Hwang, Kao-Shing
口試委員:黃國勝陳昱仁蔡清池李祖聖
口試委員(外文):Hwang, Kao-ShingChen, Yu-JenTsai, Ching-ChihLi, Tzuu-Hseng
口試日期:2012-07-20
學位類別:碩士
校院名稱:國立中正大學
系所名稱:電機工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:91
中文關鍵詞:增強式學習雙足機器人
外文關鍵詞:Reinforcement learningBiped robotWalking robotRobotics
相關次數:
  • 被引用被引用:0
  • 點閱點閱:405
  • 評分評分:
  • 下載下載:98
  • 收藏至我的研究室書目清單書目收藏:0
在人型機器人的範疇中,要建立一個具有18個維度的雙足機器人模型,並同時要將此模型運用在機器人的行走平衡上,是需要極大量的數學推導與計算的。本論文的目的在於利用增強式學習法來實現控制雙足機器人的行走與平衡。在預先沒有對動態模型有所認知的情況下,學習如何穩定的向前步行,並且只利用兩腳的型態改變來達到穩定行走的目的,這可使雙足機器人空出雙手來做其他運用,接著,再利用增強式學習法來學習該如何縮短步態,使行走的速度可以有穩定的成長。
Q-learning可以在訓練每個姿態穩定的同時,把步態之間的關聯性納入其中,因此,我們設計了一個學習架構,每個決定都是基於上一個步態,來處理在實際的連續動作空間中穩定行走並保持平衡的這個複雜問題,另一方面,設計了另一個架構來解決步行速度的問題,我們只需要將機器人的動作設計的非常密集,利用本文提出的方法可以使機器人一方面可以增快行走速度,也顧及方向及穩定度,而在機器人行走時,比較容易向前傾斜或向後傾倒,因此在狀態的設計上,並沒有以等量的方法做切割,而是針對機器人行走時前後倒的狀況做比較嚴格的判斷。
在本論文中,代理人的學習納入了人類在平衡時的直覺反應,以及對於步行的評估來做為學習的依據。由於雙足機器人的物理模型已經具備如何行走的知識,我們藉著本論文提出的方法,在行走速度以及穩定度的觀點上,來做改善和增進。

In the context of the humanoid biped robot, to build a robot model with 18 dimensions, and want to apply this model to achieve the balance of robot behavior at the same time needs a lot of calculation of mathematical derivations. The study on biped walking and balance control using reinforcement learning is presented in this paper. The algorithm can lead a robot learn how to walk without any previous knowledge of any explicit dynamics model. At the same time, achieving stable walking by using only the type of gait change to achieve stable walking. This will allow robot’s hand free and able to do other things. We only need to design robot action very dense. Then, using reinforcement learning to discover how to shorten the partitions between the gait, and walking speed can have a steady progress.
Q-learning which can not only train each gait to stable , but also train the correlation between the gait in continuous domain. The learning architecture is developed in to solve about this. It spans the basis discrete actions to construct a continuous action policy.
On the other hand, another architecture is developed in to solve the problem of walking speed, by reducing the pose between gaits. Not only can increase the speed of walking, but also take into account the stability of direction. When the robot is walking, it easier to tilt forward or recline, so the states are not cutting equivalently.
In this paper, the agent will incorporate the human intuitive balancing knowledge and walking evaluation knowledge during walking. The biped robot can perform its basic walking skill with a priori knowledge and then learn to improve its behavior in terms of walking speed and restricted positions of center.

中文摘要 ii
Abstract iii
Table of contents iv
List of figures vii
List of tables ix
I. Introduction 1
1.1 Preface 1
1.2 The Plane and Motive of Research 2
1.3 Dissertation Organization 2
1.4 Review of the Literature 3
II. Application Principles 4
2.1 Reinforcement Learning 4
2.2 Markov Decision Process 5
2.3 Q-learning Algorithm 8
2.4 The Denavit-Hartenberg Representation 10
III. Proposed Method 13
3.1 Tune-up Walking Pose 13
3.1.1 State Space Construction 14
3.1.2 Reward Function 15
3.1.3 Action Space 16
3.1.4 Policy Update 16
3.2 Pose Reduction 17
3.2.1 State Space Construction 18
3.2.2 Reward Function 18
3.2.3 Action Space 18
3.2.4 Policy Update 18
IV. Tune-up Walking Pose 20
4.1 Simulation 20
4.1.1 Simulation Environment 21
4.1.2 Biped Robot 22
4.2 Tune-up 24
4.2.1 Pose Design 25
4.2.2 Simulation Parameters Setting 28
4.2.3 Result 34
V. Pose Reduction 38
5.1 Reduce Poses 38
5.1.1 Simulation Parameters Setting 39
5.1.2 Result 42
5.2 Experiment 50
5.2.1 HSV color Space 54
5.2.2 Experiment Environment 56
5.2.3 Biped Robot 56
5.2.4 Experiment Parameters Setting 60
5.2.5 Experiment Result 62
5.3 Adaptive Degree 70
5.3.1 The Adaptive Degree Method 70
5.4 Walking on Slope 71
5.4.1 Simulation Environment 71
5.4.2 Simulation 72
5.4.3 Experiment 74
VI. Conclusions 75
6.1 Summarize 75
6.2 Future Work 76
References 78
VITA 81

[1] K. Hirai, M. Hirose, and T. Takenaka, “The Development of Honda Humanoid Robot,” Proc. 1998 in IEEE International Conference on Robotics & Automation, pp. 160 – 165, 1998.
[2] D. Kim, S. J. Seo and G. T. Park, “Zero-moment point trajectory modelling of a biped walking robot using an adaptive neuro-fuzzy system,” IEE Proc.-Control Theory Appl., Vol. 152,No. 4,, pp. 411 – 426, 2005.
[3] Napoleon, “Balance control analysis of humanoid robot based on ZMP feedback control ,” IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol.3, pp.2437 - 2442, 2002.
[4] L. Hu, Z. Sun, “Reinforcement Learning Method-Based Stable Gait Synthesis for Biped Robot,” International Conferenc e on Control, Automation, Robotics and Vision, pp.1017-1022, 2004.
[5] K. Suwanratchatamanee, M. Matsumoto, “Balance Control of Robot and Human-Robot Interaction with Haptic Sensing Foots ,” IEEE Human System Interaction(HSI), pp. 68 –74, 2009.
[6] R. S. Sutton and A. G. Barto, Reinforcement Learning An Introduction, Cambridge, Mass., MIT Press,1998.
[7] N. J. Nilsson, “Introduction to Machine Learning,” Robotics Laboratory Department of Computer Science Stanford University, pp. 159 –174, 1997.
[8] K. S. Fu, R.C. Gonzalez, and C.S.G Lee, Robotics: Control, Sensing, Vision and Intelligence, Purdue University, 1987.
[9] Markov decision process , http://en.wikipedia.org/wiki/Markov_decision_process
[10] K. C. Choi, H. J. Lee and M. Cheol, “Fuzzy Posture Control for Biped Walking Robot Based on Force Sensor for ZMP,” SICE-ICASE International Joint Conference, pp. 1185 – 1189, 2006.
[11] J. Morimoto, and C. G. Atkeson, “Learning Biped Locomotion,” IEEE Robotics & Automation Magazine, pp. 41 – 51, 2007.
[12] J. P. Ferreira, M. M. Crisóstomo, and A. Paulo Coimbra, “SVR Versus Neural-Fuzzy Network Controllers for the Sagittal Balance of a Biped Robot,” Neural Networks, IEEE Transactions on, pp. 1885 – 1897, 2009.
[13] C. Zhou, Q. Meng, “Reinforcement Learning with Fuzzy Evaluative Feedback for a Biped Robot,” Proc. 2000 in IEEE International Conference on Robotics & Automation, pp. 3829 – 3834, 2000.
[14] J. Morimoto, G. Cheng, C. G. Atkeson, and G. Zeglin, “A Simple Reinforcement Learning for Biped Robot,” Proc. 2004 in IEEE International Conference on Robotics & Automation, pp. 3030 – 3035, 2004.
[15] C. Zhou, and Q. Meng, “Dynamic Balance of a Biped Robot Using Fuzzy Reinforcement Learning Agents,” Fuzzy Setsand Systems134, pp. 169 – 187, 2003.
[16] T. Suzuki, T. Tsuji and K. Ohnishi, “Trajectory Planning of Biped Robot for Running Motion,” Industrial Electronics Society IECON 2005. 31st Annual Conference of IEEE, pp. 1815 – 1820, 2005.
[17] Cyberbotics Ltd. Webots Reference Manual,
http://www.cyberbotics.com/reference/
[18] Cyberbotics Ltd. Webots User Guide,
http://www.cyberbotics.com/guide/
[19] HSL和HSV色彩空間,
http:/zh.wikipedia.org/wiki/HSL和HSV色彩空间
[20] 求是科技, Visual C++數位影像處理技術大全, 2008
[21] AGB65-ADC,
http://www.robotsfx.com/robot/AGB65_ADC.html
[22] AS-FS Force sensor,
http://www.robotsfx.com/robot/AS_FS.html

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top