跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.80) 您好!臺灣時間:2025/01/26 00:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:顏安利
研究生(外文):Yen, Annie
論文名稱:機械手臂控制: 深度強化學習網路應用於抓取和放置協同作用
論文名稱(外文):Robotic Arm Manipulation: Deep Reinforcement Learning Network for Grasping and Placing Synergy
指導教授:朱威達連震杰
指導教授(外文):Chu, Wei-TaLien, Jenn-Jier
口試委員:吳進義連震杰朱威達鍾俊輝江佩如
口試委員(外文):Wu, Jin-YiLien, Jenn-JierChu, Wei-TaChung, Chun-huiChiang, Pei-Ju
口試日期:2023-07-03
學位類別:碩士
校院名稱:國立成功大學
系所名稱:人工智慧機器人碩士學位學程
學門:工程學門
學類:其他工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:57
中文關鍵詞:六軸機械手臂深度增強式學習物件捨取取放作業
外文關鍵詞:6 DoF industrial manipulatorDeep reinforcement learningObject graspingPick-and-place tasks
相關次數:
  • 被引用被引用:0
  • 點閱點閱:194
  • 評分評分:
  • 下載下載:73
  • 收藏至我的研究室書目清單書目收藏:0
這項研究應對了在隨機環境中教導機器人複雜的多步操作任務的挑戰。為了應對這一挑戰,我們提出了一個框架,將dense U-net 與無模型強化式學習 (model-free reinforcement learning)相結合。這種協同方法使機器人結合抓取和放置物體的操作。基於視覺的模型,dense U-net,訓練預測可執行動作 (possible actions),並引入了自我調整探索(Self-Adjustive Exploration, SAE)策略。SAE策略根據最近損失值的Softmax分佈選擇操作。我們的框架還包括一個獎勵機制和雙重深度Q網絡(Double Deep Q-Network, DDQN)結構,用於引導機器人的基本抓取和放置操作,以完成任務。我們通過在隨機模擬環境中訓練該框架加以驗證。動作效率以完成單次任務所需的動作次數來衡量,比較前沿方法時,我們的方法在成功率和動作效率方面有相當的表現。同時消融研究確認了SAE和DDQN在需要探索可能動作的任務中的有益影響。
The study tackles the challenge of teaching robots complex multi-step manipulation tasks within randomized environments. To address this, we present a novel framework that combines a dense U-net model with model-free deep reinforcement learning techniques. This synergistic approach is designed to enable robots to seamlessly integrate the actions of grasping and placing objects. Our vision-based model, the dense U-net, is trained to predict possible actions, and we introduce a Self-Adjusted Exploration (SAE) policy. The SAE policy selects actions based on a Softmax distribution derived from recent loss values. In conjunction with a carefully crafted reward function and a Double Deep Q-Network (DDQN) structure, our framework guides the robot's primitive actions of grasping and placing objects toward successful task completion. We empirically validate our approach by training the framework to excel in grasping and stacking tasks within randomized simulated environments. Our method showcases competitive performance when compared to existing techniques, as measured by success rates and action efficiency—quantified by the number of actions required for task completion. Our ablation studies confirm the beneficial impact of both SAE and DDQN, particularly in tasks demanding action exploration.
Abstract i
摘要 ii
致謝 iii
Table of Contents iv
List of Tables vi
List of Figures vii
List of Symbols and Abbreviations viii
Chapter 1 Introduction and Related Works 1
1.1 Introduction 1
1.2 Contributions 2
1.3 Related works 3
1.3.1 RGB image to robot pose 3
1.3.2 Robot pose based on U-net model predictions 3
1.3.3 Reinforcement learning for mainpulation tasks 4
1.4 Thesis structure 4
Chapter 2 State and action of DDQN structure 6
2.1 State and action fromulation 6
2.2 Reward function 9
2.3 Double deep Q-network structure 10
2.3.1 Online network 11
2.3.2 Target network 11
2.3.3 Loss update 12
2.4 Target and behavior policy 13
2.4.1 Target policy 13
2.4.2 Behavior policy 14
Chapter 3 Dense U-net model 16
3.1 Dense U-net model structure 16
3.2 Dense U-net encoder 19
3.3 Dense U-net decoder 20
3.4 Squeeze-excitation 21
3.5 Limited channel-wise expansion 22
Chapter 4 Dense U-net model 25
4.1 Simulation 25
4.1.1 Robot simulation 25
4.1.2 Robot modeling 27
4.2 Forward and inverse kinematics 30
4.2.1 Forward kinematics 30
4.2.2 Inverse kinematics 33
Chapter 5 Experiment setup and result 34
5.1 Simulation setup 34
5.2 Evaluation matrix 35
5.3 Experiments 36
5.4 Results and discussion 42
5.4.1 Grasping tasks 42
5.4.2 Stacking tasks 45
5.5 Ablation studies 48
5.5.1 Channel-size limitation 49
5.5.2 DDQN and exploration 50
Chapter 6 Conclusion and future works 52
6.1 Conclusion 52
6.2 Future works 52
References 55
[1]Fan, L., Zhu, Y., Zhu, J., Liu, Z., Zeng, O., Gupta, A., Creus-Costa, J., Savarese, S., and Fei-Fei, L., "Surreal: Open-source reinforcement learning framework and robot manipulation benchmark," in Conference on Robot Learning, pp. 767-782, 2018.
[2]Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., and Abbeel, P., "Overcoming exploration in reinforcement learning with demonstrations," in 2018 IEEE international conference on robotics and automation (ICRA), pp. 6292-6299, 2018.
[3]Li, R., Jabri, A., Darrell, T., and Argawal, P., "Towards practical multi-object manipulation using relational reinforcement learning," in 2020 IEEE international conference on robotics and automation (ICRA), pp. 4051-4058, 2020.
[4]Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, A., and Funkhouser, T., “Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning,” IEEE International Conference on Intelligent Robots and Systems, pp.4238–4245, 2018.
[5]Kumra, S., Joshi, S., and Sahin, F., "Learning robotic manipulation tasks vis task progress based gaussian," IEEE Robotics and Automation Letters 7, no. 1, pp. 534-541, 2021.
[6]Hundt, A., Killeen, B., Greene, N., Wu, H., Kwon H., Paxton, C., and Hager, G. D., “‘Good Robot!’: Efficient Reinforcement Learning for Multi-Step-Visual Tasks with Sim to Real Transfer,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6724-6731, 2020.
[7]Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P., "Doman randomization for transferring deep neural networks from simulation to the real world," in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp.20-30, 2017.
[8]Stursa, D., Dolezel, P., and Honc, D., "Multiple Objects Localization Using Image Segmentation with U-Net," in 2021 23rd International Conference on Process Control, pp. 180-185, 2021.
[9]Dolezel, P., Strusa, D., Kopecky, D., and Jecha, J., "Memory efficient grasping point detection of nontrivial objects," IEEE Access 9, pp.82130-82145, 2021.
[10]Iglovikob, V., and Shvets, A., "Ternausnetv2: Fully convolutional network for instance segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 233-237, 2018.
[11]Jeong, R., Aytar, Y., Khosid, D., Zhou, Y., Kay, J., Lampe, T., and Nori, F., “Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation,” IEEE Robotics and Automation Letters, pp. 2718-2724, 2020.
[12]Zhu, Y., Wang, Z., Merel, J., Rusu, A., Erez, T., Cabi, S., Tunyasuyunakool, S., Kramar, J., Hadsell, R., de Freitas, N., and Heess, N., “Reinforcement and Imitation Learning for Diverse Visuomotor Skills,” arXiv preprint arXiv:1802.09564, May 2018.
[13]Hasselt, V., Guez, H.A., and Silver, D., “Deep reinforcement learning with double q-learning,” in Proceedings of the 13th AAAI Conference on Artificial Intelligence, vol. 20, no. 1, 2016.
[14]Tokic, M. "Adaptive ɛ-greedy exploration in reinforcement learning based on value differences," in Annual Conference on Artificial Intelligence, pp. 203-210, 2021.
[15]Schaul, T., Quan, J., Antonoglou, I., and Silver, D., “Prioritized experience replay,” ICLR, 2016.
[16]K. Team, “Keras Documentation: Leakyrelu Layer,” Keras, https://keras.io/api/layers/activation_layers/leaky_relu/ (accessed Aug. 6, 2023).
[17]Wand, J., Lv, P., Wang, H., and Shi, C., “Sar-U-net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in computed tomography,” Computer Methods and Programs in Biomedicine,
vol. 208, 2021.
[18]Long, F., “Microscopy cell nuclei segmentation with enhanced U-Net,” BMC Bioinformatics, vol. 21, no. 1, 2020.
[19]Okonkwo, Z., Foo, E., Li, Q., and Hou, Z. "A CNN based encrypted network traffic classifier," in Proceedings of the Australasian Computer Science Week,
pp. 74-83, 2022.
[20]Rohmer, E., Singh, S. P., and Freese, M., "V-REP: a versatile and scalable robot simulation framework," in Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1321-1326, 2013.
[21]Kucuk, S. and Bingul, Z., "Robot Kinematics: Forward and Inverse Kinematics," in Industrial Robotics: Theory, Modelling and Control, INTECH Open Access Publisher, pp. 120-129, 2006.
[22]Pereira, J.L., Queirós, M., da Costa, N.M., Marcelino, S., Meireles, J., Fonseca, J.C., Moreira, A.H. and Borges, J., "TMRobot Series Toolbox: Interfacing Collaborative Robots with Matlab," in Proceedings of the 3rd International Conference on Innovative Intelligent Industrial Production and Logistics, pp. 46-55, 2022.
[23]Kumar, S. K., “On weight initialization in deep neural networks,” arXiv preprint arXiv: 1704.08863, May 2017.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top