臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.110) 您好！臺灣時間：2026/05/04 08:46

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
紙本論文
QR Code

本論文永久網址:

研究生:

何長安

研究生(外文):

Ho, Chang-An

論文名稱:

基於安全性增強式學習之循序擾動學習演算法

論文名稱(外文):

Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm

指導教授:

林昇甫

指導教授(外文):

Lin, Sheng-Fuu

學位類別:

碩士

校院名稱:

國立交通大學

系所名稱:

電機與控制工程系所

學門:

工程學門

學類:

電資工程學類

論文種類:

學術論文

論文出版年:

2009

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

安全性增強式學習、權值擾動、循序搜尋

外文關鍵詞:

safe reinforcement learning、weight-perturbation、sequential search

相關次數:

被引用:0
點閱:215
評分:
下載:0
書目收藏:0

本論文係利用循序搜尋之概念進行類神經網路架構中的所有權值進行添加擾動量的動作，提出了一循序權值擾動於安全性增強式學習架構。並於擾動量添加完成後，對於添加擾動量前後進行優劣性評價，藉此達至權值更新動作。避免傳統擾動學習演算法易落入局部解或於解空間中某解附近產生振盪現象，而導致學習速度趨緩之問題。此外，於增強式學習架構中，利用受控體的能量概念定義學習目標狀態集合，透過此設計可大幅降低傳統增強式學習於解空間中過度搜尋較佳解之時間，即能迅速將受控體狀態控制於目標狀態集合中。於測試模擬中，利用n質量單擺系統模型進行人型機器人模擬測試，藉此證實本論文所提出的學習演算法效能表現較為彰顯。

This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function between pre-perturb and post-perturb network is executed after the perturbations are applied, so as to update the weights. Applying perturbations can avoid the solution form the phenomenon which falls into the hands of local solution and oscillating in the solution space that decreases the learning efficiency. Besides, in the reinforcement learning structure, use the Lyapunov design methods to set the learning objective and pre-defined set of the goal state. This method would greatly reduces the learning time, in other words, it can rapidly guide the plant’s state into the goal state. During the simulation, use the n-mass inverted pendulum model to perform the experiment of humanoid robot model. To prove the method in this article is more effective in learning.

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 ix

第一章緒論 1
1.1 相關研究 1
1.2 研究動機 3
1.3 論文架構 6
第二章相關知識 7
2.1 增強式學習 7
2.2 類神經網路 8
2.3 梯度演算法與權值擾動法 12
2.4 穩定性理論 15
2.4.1 穩定性分析 16
2.4.2 Lyapunov穩定理論 17
2.4.3 安全性增強式學習法 17
第三章利用循序擾動學習機制於增強式學習 19
3.1 能量函數設計於增強型學習機制 19
3.2 循序搜尋擾動學習於代理者機制 23
3.2.1 同步權值擾動學習演算法 23
3.2.2 循序權值擾動學習演算法 24
3.3 回饋評價值機制 31
3.3.1 判斷性評價 32
3.3.2 衡量性評價 33
第四章人型機器人之平衡穩定模擬實驗 35
4.1 零力矩點 36
4.2 單質量單擺系統模型 37
4.2.1 單質量單擺系統之數學模式 37
4.2.2 單質點單擺系統模型模擬測試 39
4.3 雙質量單擺系統模型 51
4.3.1 雙質量單擺系統之數學模式 51
4.3.2 Lyapunov穩定性分析及其設計 54
4.3.3 雙質點單擺系統模型模擬測試 58
4.4 n質量單擺系統模型 67
4.4.1 五質量單擺系統之數學模式 67
4.4.2 步行關鍵姿態規劃 70
4.4.3 五質點單擺系統模型模擬測試 75
4.5 n質量單擺系統模型模擬分析 78
第五章結論 84
參考文獻 86

[1] R. J. Schalkoff, Artificial neural networks, New York: McGraw-Hill, 1997.
[2] S. Kumar, Neural networks: a classroom approach, New York: McGraw-Hill, 2005.
[3] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Cambridge: MIT Press, 1998.
[4] T. M. Mitchell, Machine learning, New York: McGraw-Hill, 1997.
[5] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
[6] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[7] P. Dayan and G. E. Hinton, “Using expectation-maximization for reinforcement learning,” Neural Computation, vol. 9, no. 2, pp. 271-278, 1997.
[8] F. Worgotter and B. Porr, “Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms,” Neural Computation, vol. 17, no. 2, pp. 245-319, 2005.
[9] A. Gosavi, “Boundedness of iterates in Q-learning,” Systems & Control Letters, vol. 55, no. 4, pp. 347-349, 2006.
[10] A. Gosavi, “Reinforcement learning: a tutorial survey and recent advances,” Information Journal on Computing, vol. 21, no. 2, pp. 178-192, 2008.
[11] R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9-44, 1988.
[12] G. Tesauro, “Practical issues in temporal difference learning,” Machine Learning, vol. 8, no. 3-4, pp. 257-277, 1992.
[13] R. E. Suri and W. Schultz, “Temporal difference model reproduces anticipatory neural activity,” Neural Computation, vol. 13, no. 4, pp. 841-861, 2001.
[14] C. F. Juang, J. Y. Lin, and C. T. Lin, “Genetic reinforcement learning through symbiotic evolution for fuzzy controller design,” IEEE Trans. System, Man and Cybernetics, Part B, vol. 30, no. 2, pp. 290-302, 2000.
[15] C. J. Lin and Y. J. Xu, “Efficient reinforcement learning through dynamical symbiotic evolution for TSK-Type fuzzy controller design,” Int. Journal of General Systems, vol. 34, no.5, pp. 559-578, 2005.
[16] T. J. Perkins and A. G. Barto, “Lyapunov-constrained action sets for reinforcement learning,” in Proc. of the 18th Int. Conf. on Machine Learning, pp. 409-416, 2001.
[17] T. J. Perkins, “Lyapunov methods for safe intelligent agent design,” Ph. D. dissertation, University of Massachusetts Amherst, Amherst, Massachusetts, U.S., 2002.
[18] T. J. Perkins and A. G. Barto, “Lyapunov design for safe reinforcement learning,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 803-832, 2003.
[19] M. J. Wooldridge and N. R. Jennings, “Agent theories, architectures and languages: a survey,” in Proc. of the ECAI-94 workshop on Agent Theories, Architectures and Languages, pp. 1-35, 1995.
[20] D. H. Ackley and M. S. Littman, “Generalization and scaling in reinforcement learning,” Advances in Neural Information Processing Systems, vol. 2, pp. 550-557, 1990.
[21] S. S. Keerthi and B. Ravindran, “A tutorial survey of reinforcement learning,” SADHANA - Academy Proc. in Engineering Sciences, vol. 19, no. 6, pp. 851-889, 1994.
[22] M. Jabri and B. Flower, “Weight perturbation an optimal architecture and learning technique for analog VLSI feed-forward and recurrent multilayer networks,” IEEE Trans. Neural Networks, vol. 3, no. 1, pp. 154-157, 1992.
[23] Y. Maeda and R. J. P. D. Figueuredo, “Learning rules for neuro-controller via simultaneous perturbation,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 1119-1130, 1997.
[24] Y. Maeda and M. Wakamura, “Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation,” IEEE Trans. Neural Networks, vol. 16, no. 6, pp. 1664-1672, 2005.
[25] 林俊良，控制系統數學，全華科技圖書，2003年。
[26] H. K. Khalil, Nonlinear systems, New Jeresy: Prentice-Hall, 2002.
[27] J. J. E. Slotine and W. Li, Applied nonlinear control, New Jeresy: Prentice-Hall, 2005.
[28] Napoleon, S. Nakaura, and M. Sampei, “Balance control analysis of humanoid robot based on ZMP feedback control,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 2437-2442, 2002.
[29] T. Sugihara, Y. Nakamura, and H. Inoue, “Realtime humanoid motion generation through ZMP manipulation based on inverted pendulum control,” in Proc. of the 9th IEEE Int. Conf. on Robotics and Automation, pp. 11-15, 2002.
[30] H. Benbrahim and J. A. Franklin, “Biped dynamic walking using reinforcement learning,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 283-320, 1997.
[31] D. Katic and M. Vukobratovic, “Survey of intelligent control techniques for humanoid robots,” Journal of Intelligent and Robotic Systems, vol. 37, no. 2, pp. 117-141, 2003.
[32] M. Vukobratovic, B. Borovac, D. Surla, and D. Stokic, Biped locomotion: dynamics, stability, control and application, Berlin: Springer-Verlag, 1990.
[33] R. Lozano, I. Fantoni, and D. J. Block, “Stabilization of the inverted pendulum around its homoclinic orbit,” System & Control Letters, vol. 40, no. 3, pp. 197-204, 2002.
[34] K. J. Astrom and K. Furuta, “Swinging up a pendulum by energy control,” Automatica, vol. 36, no. 2, pp. 287-295, 2000.
[35] K. Furuta and M. Iwase, “Swing-up time analysis of pendulum,” Bulletin of the Polish Academy of Sciences - Technical Sciences, vol. 52, no. 3, pp. 153-163, 2004.
[36] I. Fantoni, R. Lozano, and M. W. Spong, “Energy based control of the pendulum,” IEEE Trans. Automatic Control, vol. 45, no. 4, pp. 725-729, 2000.
[37] J. A. C. Meesters, “The mechatronics kit first survey,” San Luis Potosi, Mexico, Tech. Rep. 2004-27, 2003.
[38] C. Popescu, “Nonlinear control of underactuated horizontal double pendulum,” M.S. thesis, University of Florida Atlantic, Boca Raton, Florida, U.S., 2002.
[39] C. L. Karr, “Design of an adaptive fuzzy logic controller using a genetic algorithm,” in Proc. of the 4th Int. Conf. Genetic Algorithms, pp. 450-457, 1991.
[40] S. Kajita, T. Yamaura, and Akira Kobayashi, “Dynamic walking control of a biped Robot along a potential energy conserving orbit,” IEEE Trans. Robotics and Automation, vol. 8, no. 4, pp. 110-123, 1992.
[41] B. Song and J. W. Choi, “Robust Nonlinear Control for Biped Walking with a Variable Step Size,” in Proc. of the SICE-ICASE Int. Joint Conf., pp. 3490-3495, 2006.
[42] K. A. De Jong, “Analysis of the behavior of class of genetic adaptive systems,” Ph. D. dissertation, University of Michigan, Ann Arbor, Michigan, U.S., 1975.
[43] C. M. Lin and C. H. Chen, “Robust fault-tolerant control for a biped robot using a recurrent cerebellar model articulation controller,” IEEE Trans. System, Man and Cybernetics, Part B, vol. 37, no. 7, pp. 110-123, 2007.
[44] J. W. Grizzle, G. Abba, and F. Plestan, “Asymptitically stable walking for biped robots analysis via systems with impulse effects,” IEEE Trans. Automatic Control, vol. 46, no. 1, pp. 725-729, 2001.
[45] T. Arakawa and T. Fukuda, “Natural motion generation of biped locomotion robot using hierarchical trajectory generation method consisting of GA, EP layers,” in Proc. of the 1997 IEEE Int. Conf. on Robotics and Automation, pp. 211-216, 1997.
[46] N. Peter, “Evolution of efficient gait with humanoid using visual feedback,” in Proc. of the 2nd Int. Conf. on Humanoid Robots, pp. 99-106, 2001.
[47] J. J. Grefenstette, “Optimization of control parameters for genetic algorithms,” IEEE Trans. System, Man, and Cybernetics, vol. 16, no. 1, pp. 122-128, 1986.

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

無相關期刊

1.	以關聯法則對於合作式粒子族群演算法訂立族群分類規則
2.	基於增強式學習架構下使用合作式粒子群最佳化演算法建構模糊類神經控制系統
3.	力回饋輔助對視覺運動協調之生理訊號變化
4.	智慧型手勢辨識系統設計
5.	利用深度資訊對複雜場景中的三維物體進行切割與辨識
6.	一種以周向傳遞之脊緣波致動的超音波馬達
7.	利用二段式熱化學氣相沉積法成長不同間距高度比之奈米碳管柱列之場發射特性研究
8.	基於拉格朗日最佳化針對H.264畫面內編碼的位元率控制演算法
9.	以硬體協助之多核心嵌入式系統效能與耗能評估工具
10.	嵌入式異質多核心間溝通之效能實測與最佳化
11.	多媒體教材內容製作之驗證方法實作—以CMMI驗證流程領域為基礎
12.	工程契約文件在工程糾紛處理上之應用探討
13.	表面奈米結構於OCBcells內成核現象之研究
14.	列印委外服務經營模式及策略之探討
15.	創新模型對台灣手工具產業經營之發展影響---以個案A公司為例

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室