|
[1] R. J. Schalkoff, Artificial neural networks, New York: McGraw-Hill, 1997. [2] S. Kumar, Neural networks: a classroom approach, New York: McGraw-Hill, 2005. [3] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Cambridge: MIT Press, 1998. [4] T. M. Mitchell, Machine learning, New York: McGraw-Hill, 1997. [5] C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992. [6] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: a survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996. [7] P. Dayan and G. E. Hinton, “Using expectation-maximization for reinforcement learning,” Neural Computation, vol. 9, no. 2, pp. 271-278, 1997. [8] F. Worgotter and B. Porr, “Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms,” Neural Computation, vol. 17, no. 2, pp. 245-319, 2005. [9] A. Gosavi, “Boundedness of iterates in Q-learning,” Systems & Control Letters, vol. 55, no. 4, pp. 347-349, 2006. [10] A. Gosavi, “Reinforcement learning: a tutorial survey and recent advances,” Information Journal on Computing, vol. 21, no. 2, pp. 178-192, 2008. [11] R. S. Sutton, “Learning to predict by the methods of temporal differences,” Machine Learning, vol. 3, no. 1, pp. 9-44, 1988. [12] G. Tesauro, “Practical issues in temporal difference learning,” Machine Learning, vol. 8, no. 3-4, pp. 257-277, 1992. [13] R. E. Suri and W. Schultz, “Temporal difference model reproduces anticipatory neural activity,” Neural Computation, vol. 13, no. 4, pp. 841-861, 2001. [14] C. F. Juang, J. Y. Lin, and C. T. Lin, “Genetic reinforcement learning through symbiotic evolution for fuzzy controller design,” IEEE Trans. System, Man and Cybernetics, Part B, vol. 30, no. 2, pp. 290-302, 2000. [15] C. J. Lin and Y. J. Xu, “Efficient reinforcement learning through dynamical symbiotic evolution for TSK-Type fuzzy controller design,” Int. Journal of General Systems, vol. 34, no.5, pp. 559-578, 2005. [16] T. J. Perkins and A. G. Barto, “Lyapunov-constrained action sets for reinforcement learning,” in Proc. of the 18th Int. Conf. on Machine Learning, pp. 409-416, 2001. [17] T. J. Perkins, “Lyapunov methods for safe intelligent agent design,” Ph. D. dissertation, University of Massachusetts Amherst, Amherst, Massachusetts, U.S., 2002. [18] T. J. Perkins and A. G. Barto, “Lyapunov design for safe reinforcement learning,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 803-832, 2003. [19] M. J. Wooldridge and N. R. Jennings, “Agent theories, architectures and languages: a survey,” in Proc. of the ECAI-94 workshop on Agent Theories, Architectures and Languages, pp. 1-35, 1995. [20] D. H. Ackley and M. S. Littman, “Generalization and scaling in reinforcement learning,” Advances in Neural Information Processing Systems, vol. 2, pp. 550-557, 1990. [21] S. S. Keerthi and B. Ravindran, “A tutorial survey of reinforcement learning,” SADHANA - Academy Proc. in Engineering Sciences, vol. 19, no. 6, pp. 851-889, 1994. [22] M. Jabri and B. Flower, “Weight perturbation an optimal architecture and learning technique for analog VLSI feed-forward and recurrent multilayer networks,” IEEE Trans. Neural Networks, vol. 3, no. 1, pp. 154-157, 1992. [23] Y. Maeda and R. J. P. D. Figueuredo, “Learning rules for neuro-controller via simultaneous perturbation,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 1119-1130, 1997. [24] Y. Maeda and M. Wakamura, “Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation,” IEEE Trans. Neural Networks, vol. 16, no. 6, pp. 1664-1672, 2005. [25] 林俊良,控制系統數學,全華科技圖書,2003年。 [26] H. K. Khalil, Nonlinear systems, New Jeresy: Prentice-Hall, 2002. [27] J. J. E. Slotine and W. Li, Applied nonlinear control, New Jeresy: Prentice-Hall, 2005. [28] Napoleon, S. Nakaura, and M. Sampei, “Balance control analysis of humanoid robot based on ZMP feedback control,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 2437-2442, 2002. [29] T. Sugihara, Y. Nakamura, and H. Inoue, “Realtime humanoid motion generation through ZMP manipulation based on inverted pendulum control,” in Proc. of the 9th IEEE Int. Conf. on Robotics and Automation, pp. 11-15, 2002. [30] H. Benbrahim and J. A. Franklin, “Biped dynamic walking using reinforcement learning,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 283-320, 1997. [31] D. Katic and M. Vukobratovic, “Survey of intelligent control techniques for humanoid robots,” Journal of Intelligent and Robotic Systems, vol. 37, no. 2, pp. 117-141, 2003. [32] M. Vukobratovic, B. Borovac, D. Surla, and D. Stokic, Biped locomotion: dynamics, stability, control and application, Berlin: Springer-Verlag, 1990. [33] R. Lozano, I. Fantoni, and D. J. Block, “Stabilization of the inverted pendulum around its homoclinic orbit,” System & Control Letters, vol. 40, no. 3, pp. 197-204, 2002. [34] K. J. Astrom and K. Furuta, “Swinging up a pendulum by energy control,” Automatica, vol. 36, no. 2, pp. 287-295, 2000. [35] K. Furuta and M. Iwase, “Swing-up time analysis of pendulum,” Bulletin of the Polish Academy of Sciences - Technical Sciences, vol. 52, no. 3, pp. 153-163, 2004. [36] I. Fantoni, R. Lozano, and M. W. Spong, “Energy based control of the pendulum,” IEEE Trans. Automatic Control, vol. 45, no. 4, pp. 725-729, 2000. [37] J. A. C. Meesters, “The mechatronics kit first survey,” San Luis Potosi, Mexico, Tech. Rep. 2004-27, 2003. [38] C. Popescu, “Nonlinear control of underactuated horizontal double pendulum,” M.S. thesis, University of Florida Atlantic, Boca Raton, Florida, U.S., 2002. [39] C. L. Karr, “Design of an adaptive fuzzy logic controller using a genetic algorithm,” in Proc. of the 4th Int. Conf. Genetic Algorithms, pp. 450-457, 1991. [40] S. Kajita, T. Yamaura, and Akira Kobayashi, “Dynamic walking control of a biped Robot along a potential energy conserving orbit,” IEEE Trans. Robotics and Automation, vol. 8, no. 4, pp. 110-123, 1992. [41] B. Song and J. W. Choi, “Robust Nonlinear Control for Biped Walking with a Variable Step Size,” in Proc. of the SICE-ICASE Int. Joint Conf., pp. 3490-3495, 2006. [42] K. A. De Jong, “Analysis of the behavior of class of genetic adaptive systems,” Ph. D. dissertation, University of Michigan, Ann Arbor, Michigan, U.S., 1975. [43] C. M. Lin and C. H. Chen, “Robust fault-tolerant control for a biped robot using a recurrent cerebellar model articulation controller,” IEEE Trans. System, Man and Cybernetics, Part B, vol. 37, no. 7, pp. 110-123, 2007. [44] J. W. Grizzle, G. Abba, and F. Plestan, “Asymptitically stable walking for biped robots analysis via systems with impulse effects,” IEEE Trans. Automatic Control, vol. 46, no. 1, pp. 725-729, 2001. [45] T. Arakawa and T. Fukuda, “Natural motion generation of biped locomotion robot using hierarchical trajectory generation method consisting of GA, EP layers,” in Proc. of the 1997 IEEE Int. Conf. on Robotics and Automation, pp. 211-216, 1997. [46] N. Peter, “Evolution of efficient gait with humanoid using visual feedback,” in Proc. of the 2nd Int. Conf. on Humanoid Robots, pp. 99-106, 2001. [47] J. J. Grefenstette, “Optimization of control parameters for genetic algorithms,” IEEE Trans. System, Man, and Cybernetics, vol. 16, no. 1, pp. 122-128, 1986.
|