跳到主要內容

臺灣博碩士論文加值系統

(44.192.254.59) 您好!臺灣時間:2023/01/27 18:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:余嘉浩
研究生(外文):Yu, Chia-Hao
論文名稱:強化學習與最佳化控制應用於單鏡頭四軸無人機
論文名稱(外文):Reinforcement Learning and Optimal Control Applied to Image-Based Multirotor ( RLOC )
指導教授:程登湖
指導教授(外文):Cheng, Teng-Hu
口試委員:鄭泗東陳宗麟程登湖
口試委員(外文):Cheng, StoneChen, Tsung-LinCheng, Teng-Hu
口試日期:2021-08-26
學位類別:碩士
校院名稱:國立陽明交通大學
系所名稱:工學院機器人碩士學位學程
學門:工程學門
學類:其他工程學類
論文種類:學術論文
論文出版年:2021
畢業學年度:110
語文別:英文
論文頁數:60
中文關鍵詞:強化學習最佳化控制四軸旋翼機捲機神經網絡循環神經網絡物件偵測
外文關鍵詞:Reinforcement learningOptimal controlMultirotorsConvolutional Neural NetworkRecurrent Neural NetworkObject detection
相關次數:
  • 被引用被引用:0
  • 點閱點閱:104
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
本研究提供了一種基於影像的多旋翼機穿框系統。多旋翼機能夠僅靠一顆便宜的單鏡頭相機,並且利用此相機所捕捉到的影像資訊穿越目標門框,最後使用結合最佳化控制理論的強化學習模型在多旋翼機飛行的過程中保持平穩且迅速。

影像方面在本次研究中扮演著不可或缺的角色,我們使用一顆能夠捕捉 RGB 色彩的單鏡頭相機,並且使用 You-Only-Look-Once v3 ( YOLOv3 ) 的卷積神經網絡 ( CNN ) 架構偵測出多旋翼機面前的門框,一但偵測出鏡頭前的門框物件,我們即可獲得門框在影像座標系中的 Bounding box 座標位置。此外利用多旋翼無人機上的慣性測量單元 ( IMU ) 取得多旋翼機當前的姿態,並且記錄過去所做的動作。綜合以上資訊,我們將上述資訊放入循環神經網路 ( RNN ) 中進行訓練,最後得出門框相對於多旋翼機的位置以及門框向對於多旋翼機的 Yaw 方向角度。

有了位置資訊之後,使用我們所開發的最佳化控制神經網絡架構,即可產生控制訊號,使得多旋翼機能夠順利穿越門框。此最佳化控制神經網絡結合 Hamilton-JacobiBellman( HJB ) 方程和強化學習,使多旋翼機以最小動作、最快的方式通過門框。透過這種方式,我們可以在沒有深度信息的情況下最佳化控制僅有單鏡頭相機的多旋翼機。

由於訓練強化學習神經網絡的時候需要與環境進行大量的互動,並且在訓練初期
無人機的飛行表現是不穩定的,所以我們為了安全起見以及為了符合台灣當地法規,我們的所有試驗都是在模擬環境中完成。我們使用微軟公司所開發的無人機模擬環境 AirSim。此模擬環境不但提供大量方便的 API 供開發使用,並且也提供更加仿真,更加精美的模擬畫面,使得相機所捕捉到的照片並不會與真實環境相距甚遠。
This research provides an image-based control system for multirotors. The multirotors rely on a cheap single-lens camera, and use the image information captured by this camera to pass through the target gate, finally we use the reinforcement learning model to combine with the optimal control theory to maintain a stable and rapid motion during the flight.

Image plays an indispensable role in this research. We use a single-lens camera that can capture RGB colors, and use You-Only-Look-Once v3 ( YOLOv3 ) convolutional neural network ( CNN ) architecture to detect the target gate in front of the multirotors. Once the gate in front of the camera is detected, we can obtain the bounding box coordinate of the gate in the image coordinate system. In addition, the inertial measurement unit ( IMU ) on the multirotors is used to obtain the current attitude of the multirotors and record the actions of multirotors in the past. Based on the above information, this information is fed into the recurrent neural network ( RNN ) for training, and finally get the position and yaw value of the target gate relative to the multirotors.

After having the position information, the control signal can be generated in time by using the optimal control neural network architecture we developed, so that the multirotors can pass through the gate smoothly. This optimized control neural network combines Hamilton-JacobiBellman ( HJB ) equations and reinforcement learning to enable the multirotors to pass through the gate with the smallest movement and the fastest way. In this way, we can optimally control the multirotors with a single-lens camera without depth information.

Since training the reinforcement learning neural network requires a lot of interaction with the environment, and the flight performance of the drone is unstable at the beginning of the training, so for the sake of safety and in order to comply with local regulations in Taiwan, all our research are completed in a simulated environment. We use AirSim, a drone simulation environment developed by Microsoft. This simulation environment not only provides numerous convenient APIs for development, but also provides more exquisite simulation pictures, so that the images captured by the camera will not be far away from the real environment.
摘要 i
Abstract ii
Table of Contents iii
List of Algorithms v
List of Tables vi
List of Figures vii
1 Introduction 1
1.1 Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Paper Servey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Selection of Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Problems of Multirotors Instability . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Training Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 VAE-RNN-TD3 ( RTD3 ) 8
2.1 Capture Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Training the Behavior of Multirotors . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Partially Observable Markov Decision Process ( POMDP ) . . . . . . . 11
2.2.2 TD3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 System Architecture ( RTD3 ) . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Reinforcement Learning Optimal Control 17
3.1 Controller Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 MDP and Dynamic Model . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Object Detection & Position Estimation . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Position Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 System Architecture ( RLOC ) . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Simulation Results 32
4.1 Multirotor with RTD3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 VAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 RTD3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.4 Behavior of Multirotors . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Multirotor with RLOC Model . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 HJB Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.3 Object Detection by YOLO . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.4 Angle Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.5 Position Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.6 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 RLOC vs. RTD3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.2 RLOC vs. Other Work . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Conclusion 57
5.1 RTD3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 RLOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References 58
[1] W. Koch, R. Mancuso, R. West, and A. Bestavros, “Reinforcement learning for uav attitude
control,” ACM Transactions on Cyber-Physical Systems, vol. 3, no. 2, pp. 1–21, 2019.
[2] N. Passalis and A. Tefas, “Deep reinforcement learning for controlling frontal person closeup
shooting,” Neurocomputing, vol. 335, pp. 37–47, 2019.
[3] P. V. Klaine, J. P. Nadas, R. D. Souza, and M. A. Imran, “Distributed drone base station
positioning for emergency cellular networks using reinforcement learning,” Cognitive
computation, vol. 10, no. 5, pp. 790–804, 2018.
[4] B. G. Maciel-Pearson, L. Marchegiani, S. Akcay, A. Atapour-Abarghouei, J. Garforth,
and T. P. Breckon, “Online deep reinforcement learning for autonomous uav navigation
and exploration of outdoor environments,” arXiv preprint arXiv:1912.05684, 2019.
[5] K. Kersandt, G. Muñoz, and C. Barrado, “Self-training by reinforcement learning for fullautonomous
drones of the future,” in 2018 IEEE/AIAA 37th Digital Avionics Systems Conference
(DASC). IEEE, 2018, pp. 1–10.
[6] C. Piciarelli and G. L. Foresti, “Drone patrolling with reinforcement learning,” in Proceedings
of the 13th International Conference on Distributed Smart Cameras, 2019, pp.
1–6.
[7] G. Muñoz, C. Barrado, E. Çetin, and E. Salami, “Deep reinforcement learning for drone
delivery,” Drones, vol. 3, no. 3, p. 72, 2019.
[8] C. Wu, B. Ju, Y. Wu, X. Lin, N. Xiong, G. Xu, H. Li, and X. Liang, “Uav autonomous target
search based on deep reinforcement learning in complex disaster scene,” IEEE Access,
vol. 7, pp. 117 227–117 245, 2019.
[9] O. Bouhamed, H. Ghazzai, H. Besbes, and Y. Massoud, “Autonomous uav navigation: A
ddpg-based deep reinforcement learning approach,” in 2020 IEEE International Symposium
on Circuits and Systems (ISCAS). IEEE, 2020, pp. 1–5.
[10] R. Conde, J. R. Llata, and C. Torre-Ferrero, “Time-varying formation controllers for
unmanned aerial vehicles using deep reinforcement learning,” arXiv preprint arXiv:
1706.01384, 2017.
[11] V. J. Hodge, R. Hawkins, and R. Alexander, “Deep reinforcement learning for drone navigation
using sensor data,” Neural Computing and Applications, vol. 33, no. 6, pp. 2015–
2033, 2021.
[12] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actorcritic
methods,” in International Conference on Machine Learning. PMLR, 2018, pp.
1587–1596.
[13] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra,
“Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971,
2015.
[14] R. Bellman, Dynamic Programming. Dover Publications, 1957.
[15] W. Zhao, H. Liu, and F. L. Lewis, “Robust formation control for cooperative underactuated
quadrotors via reinforcement learning,” IEEE Transactions on Neural Networks and
Learning Systems, pp. 1–11, 2020.
[16] S. Bhasin, R. Kamalapurkar, M. Johnson, K. G. Vamvoudakis, F. L. Lewis, and W. E.
Dixon, “A novel actor–critic–identifier architecture for approximate optimal control of
uncertain nonlinear systems,” Automatica, vol. 49, no. 1, pp. 82–92, 2013.
[17] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. The
MIT Press, 2018. [Online]. Available: http://incompleteideas.net/book/the-book-2nd.html
[18] R. Bellman, “A markovian decision process,” Journal of Mathematics and Mechanics,
vol. 6, no. 5, pp. 679–684, 1957. [Online]. Available: http://www.jstor.org/stable/
24900506
[19] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller,
“Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602,
2013.
[20] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv, 2018.
[21] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding
beyond pixels using a learned similarity metric,” in Proceedings of The 33rd
International Conference on Machine Learning, ser. Proceedings of Machine Learning
Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York,
New York, USA: PMLR, 20–22 Jun 2016, pp. 1558–1566. [Online]. Available:
http://proceedings.mlr.press/v48/larsen16.html
[22] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and
Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine
translation,” arXiv preprint arXiv:1406.1078, 2014.
電子全文 電子全文(網際網路公開日期:20260830)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊