|
[1]R. Sutton and A. Barto, Introduction to Reinforcement Learning, MIT Press, Cambridge, MA, USA, 1st edition, 1998. [2]V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level Control through Deep Reinforcement Learning”, Nature, 518(7540):529–533, 2015. [3]R. J. Williams, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning”, Machine Learning, 8(3-4):229–256, 1992. [4]Zell, Andreas, “Chapter 5.2”, Simulation Neuronaler Netze, 1st edition, 1994. [5]V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning”, ArXiv preprint arXiv:1602.01783, 2016. [6]T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude”, COURSERA: Neural Networks for Machine Learning, 4(2), 2012. [7]P. Mirowski, R. Pascanu, F. Viola, H. Soyer, A. Ballard, A. Banino, M. Denil, R. Goroshin, L. Sifre, K. Kavukcuoglu, et al, “Learning to Navigate in Complex Environments”, ICLR, 2017. [8]R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to Construct Deep Recurrent Neural Networks”, ArXiv preprint arXiv:1312.6026, 2013. [9]B. Bakker, “Reinforcement Learning with Long Short-Term Memory”, NIPS, 1475–1482, 2001. [10]P. de Boer, D. P. Kroese, S. Mannor, R. Y. Rubinstein, “A Tutorial on the Cross-Entropy Method”, Annals of Operations Research, 134(1):19–67, 2005. [11]Beattie, Charles, et al. “DeepMind Lab” ArXiv preprint arXiv:1612.03801, 2016. [12]D. Eigen, C. Puhrsch, and R. Fergus, “Depth Map Prediction from a Single Image using a Multi-scale Deep Network”, Proc. of Neural Information Processing Systems, NIPS, 2366-2374, 2014.
|