跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.110) 您好!臺灣時間:2026/05/04 08:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:何長安
研究生(外文):Ho, Chang-An
論文名稱:基於安全性增強式學習之循序擾動學習演算法
論文名稱(外文):Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm
指導教授:林昇甫林昇甫引用關係
指導教授(外文):Lin, Sheng-Fuu
學位類別:碩士
校院名稱:國立交通大學
系所名稱:電機與控制工程系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:89
中文關鍵詞:安全性增強式學習權值擾動循序搜尋
外文關鍵詞:safe reinforcement learningweight-perturbationsequential search
相關次數:
  • 被引用被引用:0
  • 點閱點閱:215
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本論文係利用循序搜尋之概念進行類神經網路架構中的所有權值進行添加擾動量的動作,提出了一循序權值擾動於安全性增強式學習架構。並於擾動量添加完成後,對於添加擾動量前後進行優劣性評價,藉此達至權值更新動作。避免傳統擾動學習演算法易落入局部解或於解空間中某解附近產生振盪現象,而導致學習速度趨緩之問題。此外,於增強式學習架構中,利用受控體的能量概念定義學習目標狀態集合,透過此設計可大幅降低傳統增強式學習於解空間中過度搜尋較佳解之時間,即能迅速將受控體狀態控制於目標狀態集合中。於測試模擬中,利用n質量單擺系統模型進行人型機器人模擬測試,藉此證實本論文所提出的學習演算法效能表現較為彰顯。
This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function between pre-perturb and post-perturb network is executed after the perturbations are applied, so as to update the weights. Applying perturbations can avoid the solution form the phenomenon which falls into the hands of local solution and oscillating in the solution space that decreases the learning efficiency. Besides, in the reinforcement learning structure, use the Lyapunov design methods to set the learning objective and pre-defined set of the goal state. This method would greatly reduces the learning time, in other words, it can rapidly guide the plant’s state into the goal state. During the simulation, use the n-mass inverted pendulum model to perform the experiment of humanoid robot model. To prove the method in this article is more effective in learning.
中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 ix

第一章 緒論 1
1.1 相關研究 1
1.2 研究動機 3
1.3 論文架構 6
第二章 相關知識 7
2.1 增強式學習 7
2.2 類神經網路 8
2.3 梯度演算法與權值擾動法 12
2.4 穩定性理論 15
2.4.1 穩定性分析 16
2.4.2 Lyapunov穩定理論 17
2.4.3 安全性增強式學習法 17
第三章 利用循序擾動學習機制於增強式學習 19
3.1 能量函數設計於增強型學習機制 19
3.2 循序搜尋擾動學習於代理者機制 23
3.2.1 同步權值擾動學習演算法 23
3.2.2 循序權值擾動學習演算法 24
3.3 回饋評價值機制 31
3.3.1 判斷性評價 32
3.3.2 衡量性評價 33
第四章 人型機器人之平衡穩定模擬實驗 35
4.1 零力矩點 36
4.2 單質量單擺系統模型 37
4.2.1 單質量單擺系統之數學模式 37
4.2.2 單質點單擺系統模型模擬測試 39
4.3 雙質量單擺系統模型 51
4.3.1 雙質量單擺系統之數學模式 51
4.3.2 Lyapunov穩定性分析及其設計 54
4.3.3 雙質點單擺系統模型模擬測試 58
4.4 n質量單擺系統模型 67
4.4.1 五質量單擺系統之數學模式 67
4.4.2 步行關鍵姿態規劃 70
4.4.3 五質點單擺系統模型模擬測試 75
4.5 n質量單擺系統模型模擬分析 78
第五章 結論 84
參考文獻 86
[1] R. J. Schalkoff, Artificial neural networks, New York: McGraw-Hill, 1997.
[2] S. Kumar, Neural networks: a classroom approach, New York: McGraw-Hill, 2005.
[3] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Cambridge: MIT Press, 1998.
[4] T. M. Mitchell, Machine learning, New York: McGraw-Hill, 1997.
[5] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
[6] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[7] P. Dayan and G. E. Hinton, “Using expectation-maximization for reinforcement learning,” Neural Computation, vol. 9, no. 2, pp. 271-278, 1997.
[8] F. Worgotter and B. Porr, “Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms,” Neural Computation, vol. 17, no. 2, pp. 245-319, 2005.
[9] A. Gosavi, “Boundedness of iterates in Q-learning,” Systems & Control Letters, vol. 55, no. 4, pp. 347-349, 2006.
[10] A. Gosavi, “Reinforcement learning: a tutorial survey and recent advances,” Information Journal on Computing, vol. 21, no. 2, pp. 178-192, 2008.
[11] R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9-44, 1988.
[12] G. Tesauro, “Practical issues in temporal difference learning,” Machine Learning, vol. 8, no. 3-4, pp. 257-277, 1992.
[13] R. E. Suri and W. Schultz, “Temporal difference model reproduces anticipatory neural activity,” Neural Computation, vol. 13, no. 4, pp. 841-861, 2001.
[14] C. F. Juang, J. Y. Lin, and C. T. Lin, “Genetic reinforcement learning through symbiotic evolution for fuzzy controller design,” IEEE Trans. System, Man and Cybernetics, Part B, vol. 30, no. 2, pp. 290-302, 2000.
[15] C. J. Lin and Y. J. Xu, “Efficient reinforcement learning through dynamical symbiotic evolution for TSK-Type fuzzy controller design,” Int. Journal of General Systems, vol. 34, no.5, pp. 559-578, 2005.
[16] T. J. Perkins and A. G. Barto, “Lyapunov-constrained action sets for reinforcement learning,” in Proc. of the 18th Int. Conf. on Machine Learning, pp. 409-416, 2001.
[17] T. J. Perkins, “Lyapunov methods for safe intelligent agent design,” Ph. D. dissertation, University of Massachusetts Amherst, Amherst, Massachusetts, U.S., 2002.
[18] T. J. Perkins and A. G. Barto, “Lyapunov design for safe reinforcement learning,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 803-832, 2003.
[19] M. J. Wooldridge and N. R. Jennings, “Agent theories, architectures and languages: a survey,” in Proc. of the ECAI-94 workshop on Agent Theories, Architectures and Languages, pp. 1-35, 1995.
[20] D. H. Ackley and M. S. Littman, “Generalization and scaling in reinforcement learning,” Advances in Neural Information Processing Systems, vol. 2, pp. 550-557, 1990.
[21] S. S. Keerthi and B. Ravindran, “A tutorial survey of reinforcement learning,” SADHANA - Academy Proc. in Engineering Sciences, vol. 19, no. 6, pp. 851-889, 1994.
[22] M. Jabri and B. Flower, “Weight perturbation an optimal architecture and learning technique for analog VLSI feed-forward and recurrent multilayer networks,” IEEE Trans. Neural Networks, vol. 3, no. 1, pp. 154-157, 1992.
[23] Y. Maeda and R. J. P. D. Figueuredo, “Learning rules for neuro-controller via simultaneous perturbation,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 1119-1130, 1997.
[24] Y. Maeda and M. Wakamura, “Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation,” IEEE Trans. Neural Networks, vol. 16, no. 6, pp. 1664-1672, 2005.
[25] 林俊良,控制系統數學,全華科技圖書,2003年。
[26] H. K. Khalil, Nonlinear systems, New Jeresy: Prentice-Hall, 2002.
[27] J. J. E. Slotine and W. Li, Applied nonlinear control, New Jeresy: Prentice-Hall, 2005.
[28] Napoleon, S. Nakaura, and M. Sampei, “Balance control analysis of humanoid robot based on ZMP feedback control,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 2437-2442, 2002.
[29] T. Sugihara, Y. Nakamura, and H. Inoue, “Realtime humanoid motion generation through ZMP manipulation based on inverted pendulum control,” in Proc. of the 9th IEEE Int. Conf. on Robotics and Automation, pp. 11-15, 2002.
[30] H. Benbrahim and J. A. Franklin, “Biped dynamic walking using reinforcement learning,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 283-320, 1997.
[31] D. Katic and M. Vukobratovic, “Survey of intelligent control techniques for humanoid robots,” Journal of Intelligent and Robotic Systems, vol. 37, no. 2, pp. 117-141, 2003.
[32] M. Vukobratovic, B. Borovac, D. Surla, and D. Stokic, Biped locomotion: dynamics, stability, control and application, Berlin: Springer-Verlag, 1990.
[33] R. Lozano, I. Fantoni, and D. J. Block, “Stabilization of the inverted pendulum around its homoclinic orbit,” System & Control Letters, vol. 40, no. 3, pp. 197-204, 2002.
[34] K. J. Astrom and K. Furuta, “Swinging up a pendulum by energy control,” Automatica, vol. 36, no. 2, pp. 287-295, 2000.
[35] K. Furuta and M. Iwase, “Swing-up time analysis of pendulum,” Bulletin of the Polish Academy of Sciences - Technical Sciences, vol. 52, no. 3, pp. 153-163, 2004.
[36] I. Fantoni, R. Lozano, and M. W. Spong, “Energy based control of the pendulum,” IEEE Trans. Automatic Control, vol. 45, no. 4, pp. 725-729, 2000.
[37] J. A. C. Meesters, “The mechatronics kit first survey,” San Luis Potosi, Mexico, Tech. Rep. 2004-27, 2003.
[38] C. Popescu, “Nonlinear control of underactuated horizontal double pendulum,” M.S. thesis, University of Florida Atlantic, Boca Raton, Florida, U.S., 2002.
[39] C. L. Karr, “Design of an adaptive fuzzy logic controller using a genetic algorithm,” in Proc. of the 4th Int. Conf. Genetic Algorithms, pp. 450-457, 1991.
[40] S. Kajita, T. Yamaura, and Akira Kobayashi, “Dynamic walking control of a biped Robot along a potential energy conserving orbit,” IEEE Trans. Robotics and Automation, vol. 8, no. 4, pp. 110-123, 1992.
[41] B. Song and J. W. Choi, “Robust Nonlinear Control for Biped Walking with a Variable Step Size,” in Proc. of the SICE-ICASE Int. Joint Conf., pp. 3490-3495, 2006.
[42] K. A. De Jong, “Analysis of the behavior of class of genetic adaptive systems,” Ph. D. dissertation, University of Michigan, Ann Arbor, Michigan, U.S., 1975.
[43] C. M. Lin and C. H. Chen, “Robust fault-tolerant control for a biped robot using a recurrent cerebellar model articulation controller,” IEEE Trans. System, Man and Cybernetics, Part B, vol. 37, no. 7, pp. 110-123, 2007.
[44] J. W. Grizzle, G. Abba, and F. Plestan, “Asymptitically stable walking for biped robots analysis via systems with impulse effects,” IEEE Trans. Automatic Control, vol. 46, no. 1, pp. 725-729, 2001.
[45] T. Arakawa and T. Fukuda, “Natural motion generation of biped locomotion robot using hierarchical trajectory generation method consisting of GA, EP layers,” in Proc. of the 1997 IEEE Int. Conf. on Robotics and Automation, pp. 211-216, 1997.
[46] N. Peter, “Evolution of efficient gait with humanoid using visual feedback,” in Proc. of the 2nd Int. Conf. on Humanoid Robots, pp. 99-106, 2001.
[47] J. J. Grefenstette, “Optimization of control parameters for genetic algorithms,” IEEE Trans. System, Man, and Cybernetics, vol. 16, no. 1, pp. 122-128, 1986.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top