|
[1] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning. Cambridge, MA, USA: MIT Press, 1st ed., 1998. [2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrit- twieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search, nature, vol. 529, no. 7587, p. 484, 2016. [3] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, p. 529, 2015. [4] M. G. Bellemare, G. Ostrovski, A. Guez, P. S. Thomas, and R. Munos, “Increasing the action gap: New operators for reinforcement learning, CoRR, vol. abs/1512.04860, 2015. [5] L. C. Baird III, “Reinforcement learning through gradient descent, tech. rep., CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE, 1999. [6] R. S. Sutton, “Learning to predict by the methods of temporal differences, Machine learning, vol. 3, no. 1, pp. 9–44, 1988. [7] H. V. Hasselt, “Double q-learning, in Advances in Neural Information Processing Sys- tems, pp. 2613–2621, 2010. [8] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q- learning, in Thirtieth AAAI Conference on Artificial Intelligence, 2016. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con- volutional neural networks, in Advances in neural information processing systems, pp. 1097–1105, 2012. [10] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neu- ral networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv:1510.00149, 2015. [11] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015. [12] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167, 2015. [13] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016. [14] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning, in Thirty-First AAAI Conference on Artificial Intelligence, 2017. [15] F. Chollet, “Xception: Deep learning with depthwise separable convolutions, in Pro- ceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251– 1258, 2017. [16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. [17] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks, in European conference on computer vision, pp. 630–645, Springer, 2016. [18] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500, 2017. [19] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710, 2018. [20] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules, in Ad- vances in neural information processing systems, pp. 3856–3866, 2017. [21] G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with EM routing, in Interna- tional Conference on Learning Representations, 2018. [22] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation, tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science, 1985. [23] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014. [24] R. Girshick, “Fast r-cnn, in Proceedings of the IEEE international conference on com- puter vision, pp. 1440–1448, 2015. [25] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks, in Advances in neural information processing systems, pp. 91–99, 2015. [26] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn, in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017. [27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016. [28] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861, 2017. [29] S. Hochreiter and J. Schmidhuber, “Long short-term memory, Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [30] W.-C. Chien, H.-Y. Weng, C.-F. Lai, Z. Fan, H.-C. Chao, and Y. Hu, “A sfc-based ac- cess point switching mechanism for software-defined wireless network in iov, Future Generation Computer Systems, vol. 98, pp. 577 – 585, 2019. [31] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural net- works, in Advances in neural information processing systems, pp. 3104–3112, 2014. [32] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image cap- tion generator, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015. [33] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Ben- gio, “Show, attend and tell: Neural image caption generation with visual attention, in International conference on machine learning, pp. 2048–2057, 2015. [34] T. Rocktäschel, E. Grefenstette, K. M. Hermann, T. Kočiskỳ, and P. Blunsom, “Rea- soning about entailment with neural attention, arXiv preprint arXiv:1509.06664, 2015. [35] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473, 2014. [36] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078, 2014. [37] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025, 2015. [38] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need, in Advances in Neural Information Process- ing Systems, pp. 5998–6008, 2017. [39] R. Y. Rubinstein, “Optimization of computer simulation models with rare events, Eu- ropean Journal of Operational Research, vol. 99, no. 1, pp. 89 – 112, 1997. [40] R. Y. Rubinstein, Combinatorial Optimization, Cross-Entropy, Ants and Rare Events, pp. 303–363. Boston, MA: Springer US, 2001. [41] P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross- entropy method, Annals of Operations Research, vol. 134, pp. 19–67, Feb 2005. [42] L. Rosasco, E. D. Vito, A. Caponnetto, M. Piana, and A. Verri, “Are loss functions all the same?, Neural Computation, vol. 16, no. 5, pp. 1063–1076, 2004. [43] T. Chai and R. R. Draxler, “Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature, Geoscientific model devel- opment, vol. 7, no. 3, pp. 1247–1250, 2014. [44] L. Bottou, “Large-scale machine learning with stochastic gradient descent, in Proceed- ings of COMPSTAT’2010, pp. 177–186, Springer, 2010. [45] A. Cauchy, “Méthode générale pour la résolution des systemes d’équations simul- tanées, Comp. Rend. Sci. Paris, vol. 25, no. 1847, pp. 536–538, 1847. [46] H. Robbins and S. Monro, “A stochastic approximation method, Ann. Math. Statist., vol. 22, pp. 400–407, 09 1951. [47] J. Kiefer and J. Wolfowitz, “Stochastic estimation of the maximum of a regression func- tion, Ann. Math. Statist., vol. 23, pp. 462–466, 09 1952. [48] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning, Siam Review, vol. 60, no. 2, pp. 223–311, 2018. [49] J. Śniatycki and A. Weinstein, “Reduction and quantization for singular momentum mappings, Letters in Mathematical Physics, vol. 7, pp. 155–161, Mar 1983. [50] Y. E. Nesterov, “A method for solving the convex programming problem with conver- gence rate o (1/k^ 2), in Dokl. akad. nauk Sssr, vol. 269, pp. 543–547, 1983. [51] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011. [52] T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learn- ing, 2012. [53] M. D. Zeiler, “Adadelta: an adaptive learning rate method, arXiv preprint arXiv:1212.5701, 2012. [54] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014. [55] T. Dozat, “Incorporating nesterov momentum into adam, 2016. [56] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning, in International conference on machine learning, pp. 1139–1147, 2013. [57] S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of adam and beyond, in International Conference on Learning Representations, 2018. [58] G. Lample and D. S. Chaplot, “Playing fps games with deep reinforcement learning, in Thirty-First AAAI Conference on Artificial Intelligence, 2017. [59] T. C. Wu, S. Y. Tseng, C. F. Lai, C. Y. Ho, and Y. S. Lai, “Navigating assistance system for quadcopter with deep reinforcement learning, in 2018 1st International Cognitive Cities Conference (IC3), pp. 16–19, Aug 2018. [60] C. Y. Ho, S. Y. Tseng, C. F. Lai, M. S. Wang, and C. J. Chen, “A parameter sharing method for reinforcement learning model between airsim and uavs, in 2018 1st Inter- national Cognitive Cities Conference (IC3), pp. 20–23, Aug 2018. [61] W. Koch, R. Mancuso, R. West, and A. Bestavros, “Reinforcement learning for uav attitude control, ACM Transactions on Cyber-Physical Systems, vol. 3, no. 2, p. 22, 2019. [62] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “Autoaugment: Learning augmentation policies from data, arXiv preprint arXiv:1805.09501, 2018. [63] J. Huang, N. Li, T. Zhang, G. Li, T. Huang, and W. Gao, “Sap: Self-adaptive proposal model for temporal action detection based on reinforcement learning, in Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [64] D. P. Bertsekas, Dynamic programming and optimal control, vol. 1. Athena scientific Belmont, MA, 1995. [65] L. Buşoniu, B. De Schutter, and R. Babuška, Approximate Dynamic Programming and Reinforcement Learning, pp. 3–44. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. [66] R. Bellman, “A markovian decision process, Indiana Univ. Math. J., vol. 6, pp. 679– 684, 1957. [67] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014. [68] R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1 ed., 1957. [69] C. J. C. H. Watkins, “Learning from delayed rewards, 1989. [70] C. J. Watkins and P. Dayan, “Technical note: Q-learning, Machine Learning, vol. 8, pp. 279–292, May 1992. [71] G. A. Rummery and M. Niranjan, On-line Q-learning using connectionist systems, vol. 37. 1994. [72] M. Tokic, “Adaptive ϵ-greedy exploration in reinforcement learning based on value dif- ferences, in Proceedings of the 33rd Annual German Conference on Advances in Ar- tificial Intelligence, KI’10, (Berlin, Heidelberg), pp. 203–210, Springer-Verlag, 2010. [73] J. E. Smith and R. L. Winkler, “The optimizer’s curse: Skepticism and postdecision surprise in decision analysis, Management Science, vol. 52, no. 3, pp. 311–322, 2006. [74] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Duel- ing network architectures for deep reinforcement learning, in Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48, pp. 1995–2003, JMLR. org, 2016. [75] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning, in Inter- national conference on machine learning, pp. 1928–1937, 2016. [76] M. Babaeizadeh, I. Frosio, S. Tyree, J. Clemons, and J. Kautz, “Reinforcement learning through asynchronous advantage actor-critic on a gpu, in Learning Representations, pp. 1–12, ICLR, 2017. [77] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay, CoRR, vol. abs/1511.05952, 2016. [78] D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, and D. Sil- ver, “Distributed prioritized experience replay, in International Conference on Learn- ing Representations, 2018. [79] H. van Seijen, H. van Hasselt, S. Whiteson, and M. Wiering, “A theoretical and em- pirical analysis of expected sarsa, in 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 177–184, March 2009. [80] L. C. Baird, “Reinforcement learning in continuous time: Advantage updating, in Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), vol. 4, pp. 2448–2453, IEEE, 1994. [81] S. Syafiie, F. Tadeo, and E. Martinez, “Softmax and ε-greedy policies applied to pro- cess control, IFAC Proceedings Volumes, vol. 37, no. 12, pp. 729 – 734, 2004. IFAC Workshop on Adaptation and Learning in Control and Signal Processing (ALCOSP 04) and IFAC Workshop on Periodic Control Systems (PSYCO 04), Yokohama, Japan, 30 August - 1 September, 2004. [82] N. Ding and R. Soricut, “Cold-start reinforcement learning with softmax policy gradi- ent, in Advances in Neural Information Processing Systems, pp. 2817–2826, 2017. [83] K. Asadi and M. L. Littman, “An alternative softmax operator for reinforcement learn- ing, in Proceedings of the 34th International Conference on Machine Learning - Vol- ume 70, ICML’17, pp. 243–252, JMLR.org, 2017. [84] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym, arXiv preprint arXiv:1606.01540, 2016. [85] B. Widrow and F. W. Smith, “Pattern-recognizing control systems, 1964. [86] A. W. Moore, “Efficient memory-based learning for robot control, 1990. [87] M. W. Spong, “The swing up control problem for the acrobot, IEEE Control Systems Magazine, vol. 15, pp. 49–55, Feb 1995. [88] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning envi- ronment: An evaluation platform for general agents, Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013. [89] T. Erez, Y. Tassa, and E. Todorov, “Infinite horizon model predictive control for non- linear periodic tasks, Manuscript under review, vol. 4, 2011.
|