跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.152) 您好!臺灣時間:2025/11/04 00:27
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳信璋
研究生(外文):CHEN, HSIN-CHANG
論文名稱:基於Actor-Critic強化學習之全車主動懸吊系統控制器設計
論文名稱(外文):Actor-Critic Reinforcement Learning for Controller Design of Full Vehicle Active Suspension System
指導教授:林昱成林昱成引用關係
指導教授(外文):LIN, YU CHEN
口試委員:林俊良陳俊雄陳孝武
口試委員(外文):Lin, CHUN-LIANGCHEN , TSUN HSIUNGCHEN, SHIAW-WU
口試日期:2018-07-21
學位類別:碩士
校院名稱:逢甲大學
系所名稱:自動控制工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:中文
論文頁數:90
中文關鍵詞:強化學習Actor-Critic麥花臣式懸吊系統自適應最佳控制器抑制振動乘坐舒適性
外文關鍵詞:reinforcement learningActor-CriticMacPherson suspension systemadaptive optimal controllersuppress vibrationride comfort
相關次數:
  • 被引用被引用:0
  • 點閱點閱:328
  • 評分評分:
  • 下載下載:11
  • 收藏至我的研究室書目清單書目收藏:0
近年來隨著科技日新月異,車輛科技水準也不斷飛進,為了要能夠兼顧舒適、操控與安全表現,車輛懸吊系統也日益精進。從可變剛性彈簧至空氣彈簧的被動懸吊,到固定阻尼與可調式動態阻尼控制的半主動式懸吊,甚至電子控制的主動式懸吊系統,更是懸吊系統發展的主要趨勢。本論文主要利用基於強化學習方法之Actor-Critic架構進行全車主動懸吊系統控制器設計,同時考慮來自不平整地面顛簸的外部擾動與因車體承載質量不同所造成的系統不確定,藉由在線學習過程中不斷修正,從而學習出最佳策略以進而找到最佳主動控制力,透過所設計的控制器能兼具車輛乘坐舒適性及系統的強健穩定性。
本論文首先針對一個具十三自由度的四輪獨立汽車麥花臣式懸吊系統進行數學建模,其包含四個輪系非彈簧承載質量之垂向振動、四個輪系非彈簧承載質量之側向振動,以及彈簧承載質量質心處之垂向、側向、俯仰角、側傾角和橫擺角。接著我們藉由所提出之基於Actor-Critic強化學習架構進行主動懸吊控制器設計,該Actor-Critic強化學習過程與人腦中多巴胺產生和作用於運動神經的機制相似,多巴胺可透過加強前額系統的突觸聯繫來增強特定行為。相同的在AI系統中,這意味著神經網路的類多巴胺獎勵訊號可用來調整人工突觸權重(神經網路權重),使系統學到解決工作的正確方法。Actor-Critic強化學習方法是一種時序差分的學習方法(temporal-difference learning, TD),Critic藉由當前狀態價值函數的估計,對動作(控制力)的好壞進行評價,而Actor則是根據狀態下的價值函數產生輸出策略(policy),並以概率分布方式表示。Actor-Critic強化學習控制方法不僅能提供一即時具在線學習的自適應最佳控制器設計,本論文亦透過Actor-Critic強化學習方法實踐於具外部干擾及系統不確定性之全車主動懸吊系統控制器設計。最後我們依據車輛道路不平整度的國際標準ISO-8608來建構其路面側向及橫向不平整模型,同時考慮車輛乘載質量的不確定性,來進行主動懸吊系統與被動懸吊系統的模擬分析與量化比較,其可發現我們所設計的主動懸吊系統能有效抑制振動,使車輛兼具乘坐舒適性及系統強健性。

關鍵字:強化學習、Actor-Critic、麥花臣式懸吊系統、自適應最佳控制器、抑制振動、乘坐舒適性 

Advances in science and technology drive the evolution of the intelligent vehicle technologies. In order to enhance the ride comfort, vehicle handling performance, and safety, the vehicle suspension system is becoming increasingly advanced. From variable stiffness springs to air springs of passive suspension, fixed gain damper to adjustable dynamic damper of semi-active suspension and even electromagnetic active suspension, the suspension system has become the trend of automobile engineering. This thesis is mainly focused on the development of the active controller design based on Actor-Critic reinforcement learning approach for full vehicle active suspension system with systems uncertainties from various passenger carrying capacity as well as disturbance input from road surface. The Actor-Critic reinforcement learning scheme is used to online create an adaptive optimal controller in active suspension system that adapt to these changes in system dynamics, finding optimum and efficient strategies, simultaneously. The proposed active controller can not only significantly suppress carbody vertical and lateral acceleration, and ride comfort is improved, but also ensure performance robustness.
The 13 degrees-of-freedom (DOF) MacPherson suspension system of a four independent wheel full vehicle model is utilized, including four-wheel vertical and lateral vibration of unsprung mass, vertical and lateral vibration, pitch, yaw, and roll angles of sprung mass. The Actor-Critic learning progress is similar to the produce of dopamine in human brain and the mechanism which acts on the motoneuron, which dopamine enhances specific actions by reinforce the synaptic contact of the frontal lobe. As same as artificial intelligence (AI), it means the reward signal of dopamine in the neural network can be used to adjust the artificial neural network weight which makes the system find the right way to solve the work. Actor-Critic reinforcement learning scheme is based on temporal-difference learning (TD) algorithms, so as to assume same time-scale of actor and critic process. The Critic predicts the sum of future rewards by comparing its estimation value function with the actual one by means of a TD learning rule and the actor is updated by using policy gradient in an approximate gradient direction based on information provided by the critic, namely that the critic evaluates the action (control law) by using estimation value function in current state; the actor generates a policy based on the value function in current state. We proposed an Actor-Critic reinforcement learning approach to solve full vehicle active suspension systems with system uncertainty and external disturbance. Eventually, we construct the vertical and lateral road surface according to the international standard ISO-8608 (road surface irregularities), and the system uncertainty of vehicle cause to load mass. Between the active suspension system and passive suspension system simulation results and quantitative analysis, our proposed active suspension systems can suppress vibration from road surface irregularities effectively, and improve the ride comfort and system robustness.

Keywords: reinforcement learning; Actor-Critic; MacPherson suspension system; adaptive optimal controller; suppress vibration; ride comfort

誌謝 i
中文摘要 ii
Abstract iv
目錄 vi
圖目錄 viii
表目錄 x
符號表 xi
第一章 緒論 1
1.1 研究背景與動機 1
1.2 國內外相關研究 3
1.3 研究貢獻 6
第二章 系統動態 8
2.1 車輛四輪主動懸吊系統 13
2.2 連續系統離散化 20
2.3 乘坐舒適性評估 21
第三章 主動懸吊系統控制器設計 23
3.1 前言 23
3.2 強化學習介紹 24
3.3 系統架構 36
3.4 Actor-Critic強化學習之自適應最佳控制器設計 37
第四章 實驗與模擬結果 43
第五章 結論與未來工作 60
5.1 結論 60
5.2 未來工作 60
參考文獻 62


[1].D. Hrovata, “Survey of advanced suspension developments and related optimal control applications,” Automatica, vol. 33, no. 10, pp. 1781-1817, 1997.
[2].M. Zapaterio, F. Pozo, H. R. Karimi, and N. Luo, “Semiactive control methodologies for suspension control with magnetorheological dampers,” Transactions on Mechatronics, vol. 17, no. 2, pp. 370-380, 2012.
[3].M. Zapateiro, N. Luo, H. R. Karimi, and J. Vehi, “Vibration control of a class of semiactive suspension system using neural network and backstepping techniques,” Mechanical Systems and Signal Processing, vol. 23, no. 6, pp. 1946-1953, 2009.
[4].W. Rongrong, J. Hui, K. H. Reza, and C. Nan, “Robust fault-tolerant control of active suspension systems with finite-frequency constraint” Mechanical Systems and Signal Processing, vol. 62, pp. 341-355, 2015.
[5].S. Aouaouda, M. Chadli, and H.R. Karimi, “Robust static output-feedback controller design against sensor failure for vehicle dynamics,” Control Theory & Applications, vol. 8, no. 9, pp. 728-737, 2014.
[6].H. Pan, W. Sun, H. Gao, and J. Yu, “Finite-time stabilization for vehicle active suspension systems with hard constraints,” Transactions on Intelligent Transportation Systems, vol. 16, no. 5, pp. 2663-2672, 2015.
[7].X. Zairong, C. Daizhan, L. Qiang, and M. Shengwei, “Nonlinear decentralized controller design for multimachine power systems using Hamiltonian function method,” Automatica, vol. 38, no. 3, pp. 527-534, 2002.
[8].J. M. Mendel, and R. W. MacLaren, “Reinforcement learning control and pattern recognition systems,” Mathematics in Science and Engineering, vol. 66, pp. 287-318, 1970.
[9].R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction, MA: MIT Press, 1988.
[10].W. Schultz, “Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioral ecology,” Current Opinion in Neurobiology, vol. 14, no. 2, pp. 139-147, 2004.
[11].K. Doya, H. Kimura, and M. Kawato, “Neural mechanisms for learning and control,” IEEE Control Systems, vol. 21, no. 4, pp. 42-54, 2001.
[12].P. J. Werbos, Approximate Dynamic Programming for Real-Time Control and Neural Modeling, Handbook of Intelligent Control, Van Nostrand Reinhold, 1992.
[13].E. Calin, “Supervised learning using an active strategy,” Procedia Technology, vol. 12, pp. 220-228, 2014.
[14].T. D. Sanger, “Optimal unsupervised learning in a single-layer linear feedforward neural network,” Neural Networks, vol. 2, no. 6, pp. 459-473, 1989.
[15].M. Tang, F. Nie, S. Pongpaichet, and R. Jain, “Semi-supervised learning on large-scale geotagged photos for situation recognition,” Journal of Visual Communication and Image Representation, vol. 48, pp. 310-316, 2017.
[16].H. Du, and N. Zhang, “Fuzzy control for nonlinear uncertain electrohydraulic active suspensions with input constraints,” Transactions on Fuzzy Systems, vol. 17, no. 2, pp. 343-356, 2009.
[17].J. Cao, H. Liu, P. Li, and D. J. Brown, “State of the art in vehicle active suspension adaptive control systems based on intelligent methodologies,” Transactions on Intelligent Transportation Systems, vol. 9, no. 3, pp. 392-405, 2008.
[18].W. Guosheng, C. Tianqing, and Wang Yanliang, “Robust control design and its simulation in vehicle active suspension systems,” Chinese Control Conference, pp. 16-18, 2008.
[19].R. Darus, and Y. M. Sam, “Modeling and control active suspension system for a full car model,” Signal Processing and Applications, vol. 10, pp. 13-18, 2009.
[20].I. Eski, and S. Yildirim, “Vibration control of vehicle active suspension system using a new robust neural network control system,” Simulation Modelling Practice and Theory, vol. 17, no. 5, pp. 778-793, 2009.
[21].W. Sun, H. Gao, and O. Kaynak, “Adaptive backstepping control for active suspension systems with hard constraints,” Transactions on Mechatronics, vol. 18, no. 3, pp. 1072-1079, 2013.
[22].Z. Jing, and W. Jue, “Adaptive tracking control of vehicle suspensions with actuator saturations,” Chinese Control Conference, pp. 28-30, 2015.
[23].H. P. Wang, I. Y. Ghazally, and M. Y. Tian, “Model-free fractional-order sliding mode control for an active vehicle suspension system,” Advances in Engineering Software, vol. 115, pp. 452-461, 2018.
[24].R. S. Sutton, A. G. Barto, and R. J. Williams, “Reinforcement learning is direct adaptive optimal control,” Control Systems, vol. 12, no. 2, pp. 19-22, 1992.
[25].R. E. Bellman, “A problem in the sequential design of experiments,” Sankhya, vol. 16, no. 3, pp. 221-229, 1956.
[26].R. E. Bellman, Dynamic Programming, Princeton University Press, 1957.
[27].R. E. Bellman, “A Markov decision process,” Journal of Mathematical Mechanics, vol. 6, no. 4, pp. 679-684, 1957.
[28].R. E. Bellman, and S. E. Dreyfus, “Functional approximations and dynamic programming,” Mathematical Computation, vol. 13, no. 68, pp. 247-251, 1959.
[29].R. E. Bellman, R. Kalaba, and B. Kotkin, “Polynomial approximation: a new computational technique in dynamic programming: allocation processes,” Mathematical Computation, vol. 17, no. 82, pp. 155-161, 1963.
[30].R. E. Bellman, Dynamic Programming, Princeton University Press, 1957.
[31].R. E. Bellman, “A Markov decision process,” Journal of Mathematical Mechanics, vol. 6, no. 4, pp. 679-684, 1957.
[32].W. S. Lovejoy, “A survey of algorithmic methods for partially observed Markov decision processes,” Annals of Operations Research, vol. 28, no. 1, pp. 47-65, 1991.
[33].R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, 1960.
[34].B. Widrow, and M. E. Hoff, “Adaptive switching circuits,” Neurocomputing of Research, pp. 123-134, 1960.
[35].M. L. Tsetlin, Automaton Theory and Modeling of Biological Systems, Academic Press, 1973.
[36].A. G. Barto, and P. Anandan, “Pattern-recognizing stochastic learning automata,” Systems, Man, and Cybernetics, vol. 15. no. 3, pp. 360-375, 1985.
[37].P. J. Werbos, “Neural networks for control and system identification,” IEEE Conf. Decision and Control, pp. 260-265, 1989.
[38].D. P. Bertsekas, and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
[39].D. Han, and S. N. Balakrishnan, “State-constrained agile missile control with adaptive-critic-based neural networks,” IEEE Transactions on Control Systems Technology, vol. 10, no. 4, pp. 481-489, 2002.
[40].D. Prokhorov, Computational Intelligence in Automotive Applications. Springer-Verlag, 2008.
[41].S. Ferrari, and R. F. Stengel, “An adaptive critic global controller,” American Control Conference, pp. 2665-2670, 2002.
[42].G. G. Lendaris, L. Schultz, and T. Shannon, “Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle,” IEEE conference Neural Networks, pp. 73-78, 2000.
[43].V. R. konda, and J. John Tsitsiklis, “Actor-critic algorithms” Advances in Neural Information, pp. 1008-1014, 2000.
[44].A. G. Barto, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements that can solve difficult learning control problems,” IEEE Transactions on Systems, Man and Cybernetics, vol. 13, no. 5, pp. 834–846, 1983.
[45].H. Changchun, C. Jiannan, L. Yafeng, and L. Liang, “Adaptive prescribed performance control of half-car active suspension system with unknown dead-zone input,” Mechanical Systems and Signal Processing, vol. 111, pp. 135-148, 2018.
[46].T. Wuensche, H. K. Muhr, K. Biecker, and L. Schnaubelt, “Side load springs as a solution to minimize adverse side loads acting on the McPherson strut,” Society of Automotive Engineers, pp. 11-16, 1994.
[47].B. Nunnally, “HiPer Strut - removing the disadvantages of FWD?” CaddyInfo - Cadillac Conversations Blog, 2010.
[48].ISO 8608: Mechanical vibration - Road surface – Reporting of measured data, Nov. 2016.
[49].王漫,傷車!90%的車主都不會過減速帶。車民商城, 2017。
[50].吳龍、陳志鏗,垂向與側向路面不平整影響下的汽車分層建模振動控制研究。Journal of Science and Engineering Technology, vol. 9, no. 1, pp. 1-15, 2013.
[51].ISO 2631-1: Mechanical vibration and shock – Evaluation of human exposure to whole – body vibration, Jan. 1997.
[52].J. E. R. Staddon, “On the notion of cause, with applications to behaviorism,” Behaviorism, vol. 1, no. 1, pp. 25-63, 1973.
[53].R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3, pp. 229-256, 1992.
[54].G. Stephen, and W. L. M. John, “A neural network model of adaptively timed reinforcement learning and hippocampal dynamics,” Cognitive Brain Research, vol. 1, no. 1, pp. 3-38, 1992.
[55].J. M. Mendel, and R. W. McLaren, “8 Reinforcement-Learning Control and Pattern Recognition Systems,” Mathematics in Science and Engineering, vol. 66, pp. 287-318, 1970.
[56].S. David, “UCL Course on RL” University College London, 2015.
[57].K. Chayka, “How are ‘flappy bird’ and ‘candy crush’ still making so much money”, Pacific Standard, 2014.
[58].R. S. Sutton, and A. G. Barto, Reinforcement Learning An Introduction, The MIT Press, 2012.
[59].L. Kocsis, and C. Szepesvari, “Bandit based monte-carlo planning,” European Conference on Machine Learning, vol. 4212, pp. 282-293, 2006.
[60].E. M. Pablo, M. M. Jose, M. G. Jose, S. O. Emilio, and G. S. Juan, “Least-squares temporal difference learning based on an extreme learning machine,” Neurocomputing, vol. 141, no. 2, pp. 37-45, 2014.
[61].P. Stone, R. S. Sutton, and G. Kuhlmann, “Reinforcement learning for roboCup soccer keepaway,” Adaptive Behavior, vol. 13, no. 3, pp. 165-188, 2005.
[62].A. M. S. Barreto, and C. W. Anderson, “Restricted gradient-descent algorithm for value-function approximation in reinforcement learning,” Artificial Intelligence, vol. 172, no. 4-5, pp. 454-482, 2008.
[63].I. Carlucho, M. D. Paula, S. A. Villar, and G. G. Acosta, “Incremental Q-learning strategy for adaptive PID control of mobile robots,” Expert Systems with Applications, vol. 80, no. 1, pp. 183-199, 2017.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top