跳到主要內容

臺灣博碩士論文加值系統

(44.212.96.86) 您好!臺灣時間:2023/12/07 01:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡維軒
研究生(外文):TASI, WEI-HSUAN
論文名稱:基於機器學習實現腳踏車平衡控制
論文名稱(外文):The Implementation of Bicycle Balance Control with Deep Deterministic Policy Gradient Model
指導教授:張慶龍張慶龍引用關係
指導教授(外文):CHANG, CHING-LUNG
口試委員:張慶龍吳承崧李詩偉
口試委員(外文):CHANG, CHING-LUNGWU, CHANG-SONGLI, SU-WEI
口試日期:2022-07-26
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:57
中文關鍵詞:強化式學習機器學習深度學習深度強化式學習DDPG
外文關鍵詞:Reinforcement LearningMachine LearningDeep LearningDeep Reinforcement LearningDeep Deterministic Policy Gradient
相關次數:
  • 被引用被引用:0
  • 點閱點閱:234
  • 評分評分:
  • 下載下載:32
  • 收藏至我的研究室書目清單書目收藏:0
結合強化學習(Reinforcement Learning)技術與深度學習(Deep Learning)技術之深度強化式學習(Deep Reinforcement Leaning,DRL)具有在試錯中自我學習之能力,很適合應用在遊戲等互動式環境。本論文主要是以DRL技術,結合9軸感測器與馬達控制之動量輪原理,實現腳踏車自主平衡控制。
依輸出控制訊號特性,深度強化學習系統有分為離散型輸出訊號的深度Q學習網路(Deep Q-Network, DQN)架構與連續型輸出訊號的深度決定性策略梯度網路(Deep Deterministic Policy Gradient, DDPG)架構,本論文是採用DDPG網路架構學習如何依9軸感測器值去控制動量輪馬達,達到腳踏車自主平衡控制。首先,我們以Unity實作腳踏車自主平衡模擬學習系統,以找出適合之DDPG網路架構參數與獎勵函數,再以所得到的參數為基礎,建構實際之腳踏車自主平衡系統,藉由實際環境之學習,量測相關數據,確認DDPG於腳踏車自主平衡學習控制之可行性。實驗結果顯示,此系統自主平衡時間在模擬環境中可以平穩站立80分鐘以上,且在現實中初步驗證整個系統設計與平衡學習控制是可行的。
Deep Reinforcement Learning(DRL), which combine Reinforcement Learning(RL) and Deep Learning(DRL), has the ability of self-learning from trial and error, so it is per-fect to implement in interacting environment like games. This thesis is mainly based on DRL technology, combined with 9-axis sensor and the principle of the momentum wheel controlled by the motor, to implement the self-balancing control of the bicycle.
According to the characteristics of the output control signal, Deep Reinforcement Learning system is divided into Deep Q-Network(DQN) with discrete output signals and Deep Deterministic Policy Gradient(DDPG) with continuous output signals. This paper uses DDPG to learn how to control the momentum wheel motor according to 9-axis sen-sor value to achieve bicycle self-balancing control. First, we use Unity to implement a bi-cycle balance control simulation learning system to find suitable DDPG network archi-tecture parameters and reward functions, and then build a bicycle autonomous balance system based on the obtained parameters in reality. Learning and measuring relevant data to confirm the feasibility of DDPG learning and control of self-balancing bicycles. The experimental results show that this system can stand stably for more than 80 minutes in the simulated environment, and the preliminary verification of the whole system design and balance learning control is feasible in reality.

目錄
摘要 i
ABSTRACT ii
誌謝 iii
目錄 iv
表目錄 v
圖目錄 vi
1. 緒論 1
1.1. 研究動機 1
1.2. 腳踏車平衡系統 1
1.3. 相關研究 2
1.4. 論文架構 3
2. 背景知識 4
2.1. 腳踏車平衡系統原理 4
2.2. 機器學習(Machine Learning) 6
2.3. 強化式學習(Reinforcement Learning) 9
2.4. 深度學習(Deep Learning) 13
2.5. 深度Q網路演算法(Deep Q-Network) 15
2.6. 深度確定性策略梯度演算法(Deep Deterministic Policy Gradient) 19
3. 腳踏車平衡控制系統設計 24
3.1. 系統模擬 25
3.2. 單顆馬達腳踏車平衡控制系統 25
3.3. 兩顆馬達腳踏車平衡控制系統 29
4. 系統實作 37
4.1. 系統架構 37
4.2. 模擬環境訓練出的模型於現實中驗證 38
4.3. 馬達慣性問題 40
4.4. 實作結果 44
5. 結論 46
參考文獻 47


[1]J. Schmidhuber, “Deep Learning in Neural Networks: An Neural Networks,” Neu-ral Networks, vol 61, pp. 85-117, 2015.
[2]Kaelbling. et al., “Reinforcement learning: A survey,” Journal of AI Research, 4 pp. 237-285, 1996.
[3]Mnih,V. et al., “Playing Atari with deep reinforcement learning,” NIPS Deep Learn-ing Workshop, 2013.
[4]Timothy P. Lillicrap et al., “Continuous Control with Deep Reinforcement Learning,” International Conference on Learning Representations (ICLR), 2016.
[5]Sangduck Lee and Woonchul Ham, "Self stabilizing strategy in tracking control of unmanned electric bicycle with mass balance," IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002, pp. 2200-2205 vol.3.
[6]M. Yamakita, A. Utano and K. Sekiguchi, "Experimental Study of Automatic Con-trol of Bicycle with Balancer," 2006 IEEE/RSJ International Conference on Intelli-gent Robots and Systems, 2006, pp. 5606-5611.
[7]A. Suebsomran, "Balancing control of bicycle robot," 2012 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), 2012, pp. 69-73.
[8]L. P. Tuyen and T. Chung, "Controlling bicycle using deep deterministic policy gra-dient algorithm," 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), 2017, pp. 413-417.
[9]G. Belascuen and N. Aguilar, "Design, Modeling and Control of a Reaction Wheel Balanced Inverted Pendulum," 2018 IEEE Biennial Congress of Argentina (AR-GENCON), 2018, pp. 1-9.
[10]K Jain, "Artificial neural networks: a tutorial," Computer, vol. 29, No. 3, pp. 31-44, 1996.
[11]Watkins, C.J.C.H. (1989).Learning from delayed rewards. PhD Thesis, University of Cambridge, England.
[12]Alzubaidi, L., Zhang, J., Humaidi, A.J. et al. “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.” J Big Data 8, 53 2021.
[13]S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a convolutional neural network," 2017 International Conference on Engineering and Technology (ICET), 2017, pp. 1-6.
[14]Mnih,V. et al., “Human-level control through deep reinforcement learning,” Nature, vol 518, pp. 529–533, 2015.
[15]Vijay R. Konda and John N. Tsitsiklis, “Actor-Critic Algorithms,” Advances in Neural Information Processing Systems 12, pp. 1008-1014, 1999.
[16]Richard S. Sutton et al., “Policy Gradient Methods for Reinforcement Learning with Function Approximation,” Advances in Neural Information Processing Systems 12, pp. 1008-1014, 1999.
[17]Unity Technologies, “Unity,” https://unity.com/

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊