跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.87) 您好!臺灣時間:2025/02/09 10:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊捷耀
研究生(外文):Jie-YaoYang
論文名稱:基於DDPG與後見經驗重播之機器手臂直觀式人機操作界面設計與實現
論文名稱(外文):Design and Implementation of Intuitive Human Robot Interface for Tele-operation by DDPG with HER
指導教授:李祖聖
指導教授(外文):Tzuu-Hseng S. Li
學位類別:碩士
校院名稱:國立成功大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:90
中文關鍵詞:動態捕捉強化學習深度決定性策略梯度居家服務型機器人
外文關鍵詞:DDPGhome service robotmotion capturereinforcement learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:151
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
本論文設計一個直觀式機械手臂人機操作界面,讓使用者於室內環境中,遠端操作或教導機器人執行指定動作。本系統包含了一個手臂動作捕捉系統及一個動作追隨與學習網路,讓機器人不但可以即時追隨模仿教導者的動作,還可以將教導者手臂末端軌跡透過網路學習保存。手臂動作捕捉系統包含了一台深度攝影機與一個操作手套。深度攝影機擷取教導者動作影像,結合人體骨架辨識函式庫萃取出骨架點,取得手臂末端點位置;而操作手套上配有慣性測量單元與微處理器,用以捕捉手掌末端姿態與手指動作。為使慣性測量單元所測量得的資料更加精準,利用遞迴型最小平方法作橢圓球模型鑑別,以校正慣性測量單元的歸零參數,接著使用Madgwick演算法估算末端姿態。已擷取的末端點位置與姿態形成的動作軌跡,可透過深度確定性策略梯度網路(DDPG)進行學習,DDPG為深度學習與強化學習的混合型演算法,此演算法包含四個深度增強網路,讓系統可以更準確的學習連續性的資料。為改善學習過程中,因稀疏回饋與有效訓練取樣率低落而難以收斂的問題,本論文加入後見經驗重播訓練法,增加有效取樣率加快訓練速度,並整合逆向課程生成的模型訓練法,讓指定的手臂末端軌跡得以由網路的形式保存下來。實驗結果展示此介面可提供穩健而即時的動態捕捉,讓使用者可以直覺地遠端操控機器手臂,且使用者的末端軌跡可以透過網路學習保存。
This thesis proposes an intuitive human robot interface (HRI), which allows a human operator to tele-operate a robot and to teach the robot some specific motions. This system contains a motion capture system and a motion learning network. The motion capture system includes a RGB-D camera and an operating glove. The position of the human operator’s hand is estimated by an open skeleton tracking library, named Openpose, through the RGB-D images, while the posture of the hand is calculated by the information captured by the glove. An inertial measurement unit (IMU) and a microprocessor are equipped on the glove. The data from the IMU are calibrated by the Recursive Least Square method (RLS) and calculated by the Madgwick’s algorithm. Based on the observed position and posture of the human operator’s hand, a motion can be constructed. The motion is then trained through the Deep Deterministic Policy Gradient (DDPG) network by using the Reverse Curriculum Generation (RCG) method. The DDPG is a hybrid algorithm from deep learning and reinforcement learning, and is suitable to deal with continues data. This network has the Actor-Critic structure and a replay experience buffer, therefore, the network is more feasible and can avoid overfitting. Besides, the Hindsight Experience Replay (HER) is added to the network to increase the speed of convergence and performance. Finally, the experiments demonstrate that the proposed HRI system allows the robot to imitate human operator in real time, and the imitated motion can be learned by the proposed DDPG network.
Contents
Abstract Ⅰ
Contents Ⅲ
List of Figures Ⅵ
List of Tables Ⅹ
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Related Work 2
1.2.1 Human Robot Interface (HRI) 2
1.2.2 Motion Learning 3
1.2.3 Deep Reinforcement Learning 3
1.2.4 The Training Method and Modified Structure of DRL 4
1.3 System Overview 4
1.4 Thesis Organization 6
Chapter 2 Real Time Human Motion Capture System 7
2.1 Introduction 7
2.2 Definition of Different Coordinate Systems 8
2.3 End-effector Position Estimation 10
2.3.1 Human Body Skeleton Tracking Using Openpose 11
2.3.2 Jitter Removal Filter 12
2.3.3 Anti Z-axis Rotation 14
2.4 Palm Orientation Estimation 16
2.4.1 Inertial Measurement Unit (IMU) 16
2.4.2 IMU Calibration[42] 17
2.4.3 Madgwick’s Algorithm [44] 20
2.5 Experiments 30
Chapter 3 DDPG Based Manipulator Goal Tracking System 33
3.1 Introduction 33
3.2 System Overview 34
3.3 Deep Deterministic Policy Gradient (DDPG) 35
3.4 DDPG on 7-DOF Manipulator 45
3.5 Simulation and Experiments 56
Chapter 4 Enhancing DDPG with HER and RCG 61
4.1 Introduction 61
4.2 Model Training Acceleration with HER 62
4.3 Simulational Result of DDPG with HER 65
4.4 Reverse Curriculum Generation (RCG) [54] Inspired Motion Representation 68
4.5 Simulation Result of RCG Inspired Training 72
Chapter 5 Experiments 75
5.1 Introduction 75
5.2 Experimental Environment Configuration 76
5.3 Experiment I: Tolerance Test for Different Users 77
5.4 Experiment II: Simple Service Task and Interact with Human 79
Chapter 6 Conclusions and Future Work 82
6.1 Conclusions 82
6.2 Future Work 84
References 85
[1]L. Peppoloni, F. Brizzi, C. Avizzano and E. Ruffaldi, “Immersive ros-integrated framework for robot teleoperation, in Proceedings of Symposium on 3D User Interfaces, pp. 177-178, 2015.
[2]A. Billard, S. Calinon, R. Dillmann and S. Schaal, “Robot programming by demonstration, in Springer Handbook of Robotics, pp. 1371-1394, 2008.
[3]K. S. Fu, R. C. Gonzalez and C. S. G. Lee, “Robotics: control, sensing, vision, and intelligence, New York: Mc Graw-Hill, 1987.
[4]D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez and Y. Chen, “Mastering the game of go without human knowledge, Nature, vol. 550, no. 7676, pp. 354-359, 2017.
[5]V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning, in Proceedings of International Conference on Machine Learning, pp. 1928-1937, 2016.
[6]H. Van Hasselt, A. Guez and D. Silver, “Deep reinforcement learning with double q-learning, in Proceedings of AAAI Conference on Artificial Intelligence, 2016.
[7]C. Schou, J. S. Damgaard, S. Bøgh, and O. Madsen, “Human-robot interface for instructing industrial tasks using kinesthetic teaching, in Proceedings of International Symposium on Robotics, pp. 1-6, 2013.
[8]Y. Ou, J. Hu, Z. Wang, Y. Fu, X. Wu and X. Li, “A real-time human imitation system using Kinect, International Journal of Social Robotics, vol. 7, no. 5, pp. 587-600, 2015.
[9]M. Alibeigi, S. Rabiee and M. Ahmadabadi, “Inverse kinematics based human mimicking system using skeletal tracking technology, Journal of Intelligent and Robotic Systems, vol. 85, no. 1, pp. 27-45, 2016.
[10]J. Lei, M. Song, Z. N. Li and C. Chen, “Whole-body humanoid robot imitation with pose similarity evaluation, Signal Processing, vol. 108, pp. 136-146, 2015.
[11]M. Zhang, J. Chen, X. Wei and D. Zhang, “Work chain‐based inverse kinematics of robot to imitate human motion with Kinect, ETRI Journal, vol. 40, no. 4, pp. 511-521, 2018.
[12]P. Liang, L. Ge, Y. Liu, L. Zhao, R. Li and K. Wang, “An augmented discrete-time approach for human-robot collaboration, Discrete Dynamics in Nature and Society, vol. 2016, Article ID 9126056, http://dx.doi.org/10.1155/2016/9126056, 2016.
[13]L. Zhao, Y. Liu, K. Wang, P. Liang and R. Li, “An intuitive human robot interface for tele-operation, in Proceedings of IEEE International Conference on Real-time Computing and Robotics, pp. 454-459, 2016.
[14]Yang, C., Luo, J., Pan, Y., et al, “Personalized variable gain control with tremor attenuation for robot teleoperation, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 99, pp. 1-12, 2017.
[15]Chih-Yen Chen, “Board learning system based gaze and gesture classification for telerobotics interface, M.S. thesis, Dept. Elect. Eng., National Cheng Kung Univ., Tainan, Taiwan, 2018.
[16]J. Kofman, X. Wu, T. J. Luu and S. Verma, “Teleoperation of a robot manipulator using a vision-based human-robot interface, IEEE Transactions on Industrial Electronics, vol. 52, no. 5, p.p.1206-1219., 2005.
[17]V. Mnih, , A. P. Badia, , M. Mirza, A. Graves, T. Lillicrap, T. Harley and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning, in Proceedings of International Conference on Machine Learning, pp. 1928-1937, June, 2016
[18]Y. Yuan, Z. Li, T. Zhao and D. Gan, “DMP-based motion generation for a walking exoskeleton robot using reinforcement learning, IEEE Transactions on Industrial Electronics (Early Access), DOI: 10.1109/TIE.2019.2916396, 2019.
[19]S. Calinon, F. Guenter and A. Billard, “On learning, representing, and generalizing a task in a humanoid robot, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 37, no. 2, pp. 286-298, 2007.
[20]M. J. Zeestraten, I. Havoutis, J. Silvério, S. Calinon and D. G. Caldwell, “An approach for imitation learning on riemannian manifolds, IEEE Robotics and Automation Letters, vol. 2 no. 3, pp. 1240-1247, 2017.
[21]R. Elbasiony and W. Gomaa, “Humanoids skill learning based on real-time human motion imitation using kinect, Intelligent Service Robotics, vol. 11, no.2, pp.149-169, 2018.
[22]A. C. Dometios, Y. Zhou, X. S. Papageorgiou, C. S. Tzafestas and T. Asfour, “Vision-based online adaptation of motion primitives to dynamic surfaces: application to an interactive robotic wiping task, IEEE Robotics and Automation Letters, vol. 3, no.3, pp.1410-1417, 2018.
[23]S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks. in Proceedings of Advances in Neural Information Processing Systems Conference, pp. 91-99, 2015.
[24]C. Szegedy, S. Ioffe, V. Vanhoucke and A. A. Alemi, “Inception-v4, Inception-resnet and the Impact of Residual Connections on Learning, in Proceedings of AAAI Conference on Artificial Intelligence, pp. 4078-4284, 2017.
[25]V. R. Konda and J. N. Tsitsiklis, “Actor-Critic algorithms, in Proceedings of Advances in Neural Information Processing Systems Conference, pp. 1008-1014, 2000.
[26]C. J. Watkins and P. Dayan, “Q-learning, Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992.
[27]V. Mnih, , K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare and S. Petersen, “Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, pp. 529, 2015.
[28]S. Gu, T. Lillicrap, I. Sutskever and S. Levine, , “Continuous deep q-learning with model-based acceleration, in Proceedings of International Conference on Machine Learning, pp. 2829-2838, June, 2016.
[29]T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y Tassa and D. Wierstra, “Continuous control with deep reinforcement learning, arXiv: 1509.02971, 2015.
[30]A. Y. Ng, D. Harada and S. Russell, “Policy invariance under reward transformations: theory and application to reward shaping, in Proceedings of International Conference on Machine Learning, vol. 99, pp. 278-287, 1999.
[31]D. Dewey, “Reinforcement learning and the reward engineering principle, in Proceedings of AAAI Spring Symposium Series, pp. 13-16, 2014.
[32]G. Lample and D. S. Chaplot, “Playing FPS games with deep reinforcement learning, in Proceedings of AAAI Conference on Artificial Intelligence, pp.2140-2146, 2017.
[33]T. D. Kulkarni, K. Narasimhan, A. Saeedi and J. Tenenbaum, “Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation, in Advances in Neural Information Processing Systems, pp. 3675-3683, 2016.
[34]X. B. Peng, G. Berseth, K. Yin and M. Van De Panne, “Deeploco: dynamic locomotion skills using hierarchical deep reinforcement learning, ACM Transactions on Graphics, vol. 36, no. 4, pp.41, 2017.
[35]Y. Wu and Y. Tian, “Training agent for first-person shooter game with actor-critic curriculum learning, in Proceedings of International Conference on Learning Representations, 2017.
[36]C. Florensa, D. Held, M. Wulfmeier, M. Zhang and P. Abbeel, “Reverse curriculum generation for reinforcement learning, arXiv:1707.05300, 2017.
[37]M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder and W. Zaremba, “Hindsight experience replay, in Proceedings of Advances in Neural Information Processing Systems Conference, pp. 5048-5058, 2017.
[38]CMU-Perceptual-Computing-Lab. [Online] Available: https://github.com/CMU-Perceptual-Computing-Lab/openpose
[39]Z. Cao, G. Hidalgo, T. Simon, S. E. Wei and Y. Sheikh, “Openpose: realtime multi-person 2d pose estimation using part affinity fields, arXiv:1812.08008, 2018.
[40]MPU-9250. [Online] Available:https://www.invensense.com/products/motion-tracking/9-axis/mpu-9250/
[41]Intel RealSense D435i. [Online] Available: https://www.intelrealsense.com/depth-camera-d435i/.
[42]W. K. Nicholson, “Linear algebra with applications, McGraw-Hill Ryerson, 2006, pp.424-433.
[43]L. E. Spence, A. J. Insel and S. H. Friedberg, “Elementary linear algebra: a matrix approach, Pearson College Div, 2017.
[44]S. Madgwick, “An efficient orientation filter for inertial and inertial/magnetic sensor arrays, Report x-io and University of Bristol (UK), vol.25, pp.113-118, 2010.
[45]Geomagnetism Data. [Online] Available: https://www.usgs.gov/natural-hazards/geomagnetism
[46]Rviz. [Online] Available: http://wiki.ros.org/rviz.
[47]Robot Operating System. [Online] Available: https://www.ros.org/.
[48]D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra and M. Riedmiller, “Deterministic policy gradient algorithms, in Proceedings of International Conference on Machine Learning, 2014.
[49]T. Tieleman and G Hinton, “Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude, COURSERA: Neural Networks for Machine Learning, vo. 4, no. 2, pp. 26-31, 2012.
[50]P. I. Corke, “A simple and systematic approach to assigning denavit–hartenberg parameters, IEEE Transactions on Robotics, vol. 23, no. 3, pp. 590-594, 2007.
[51]D-H convention rules. [Online] Available: https://blog.robotiq.com/how-to-calculate-a-robots-forward-kinematics-in-5-easy-steps.
[52]Robotis motors. [Online] Available: http://en.robotis.com/.
[53]T. Schaul, D Horgan, K. Gregor and D. Silver, “Universal value function approximators, in Proceedings of International Conference on Machine Learning, pp. 1312-1320, 2015.
[54]Arduino uno. [Online] Available: https://store.arduino.cc/usa/arduino-uno-rev3
[55]Florensa, C., Held, D., Wulfmeier, M., Zhang, et al, “Reverse curriculum generation for reinforcement learning, arXiv:1707.05300, 2017.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊