|
[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, no. 6088, p. 533, 1986. [2] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver, and K. Kavukcuoglu, “Decoupled neural interfaces using synthetic gradients,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1627–1635, JMLR. org, 2017. [3] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent nets: The difficulty of learning long-term dependencies,” in A Field Guide to Dynamical Recurrent Neural Networks (S. C. Kremer and J. F. Kolen, eds.), IEEE Press, 2001. [4] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323, 2011. [5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. [6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. [7] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pp. 448–456, JMLR.org, 2015. [8] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International Conference on Machine Learning, pp. 1310–1318, 2013. [9] F. Crick, “The recent excitement about neural networks.,” Nature, vol. 337, no. 6203, pp. 129–132, 1989. [10] D. Balduzzi, H. Vanchinathan, and J. M. Buhmann, “Kickback cuts backprop’s red-tape: Biologically plausible credit assignment in neural networks.,” in AAAI, pp. 485–491, 2015. [11] T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random synaptic feedback weights support error backpropagation for deep learning,” Nature Communications, vol. 7, p. 13276, 2016. [12] A. Nøkland, “Direct feedback alignment provides learning in deep neural networks,” in Advances in Neural Information Processing Systems, pp. 1037–1045, 2016. [13] A. G. Ororbia, A. Mali, D. Kifer, and C. L. Giles, “Conducting credit assignment by aligning local representations,” arXiv preprint arXiv:1803.01834, 2018. [14] A. G. Ororbia and A. Mali, “Biologically motivated algorithms for propagating local target representations,” arXiv preprint arXiv:1805.11703, 2018. [15] S. Bartunov, A. Santoro, B. Richards, L. Marris, G. E. Hinton, and T. Lillicrap, “Assessing the scalability of biologically-motivated deep learning algorithms and architectures,” in Advances in Neural Information Processing Systems, pp. 9390–9400, 2018. [16] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals.,” in ICML, vol. 97 of Proceedings of Machine Learning Research, pp. 4839–4850, PMLR, 2019. [17] Y. Bengio, “How auto-encoders could provide credit assignment in deep networks via target propagation,” arXiv preprint arXiv:1407.7906, 2014. [18] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio, “Difference target propagation,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 498–515, Springer, 2015. [19] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008. [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.,” in ICLR (Y. Bengio and Y. LeCun, eds.), 2015. [21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [22] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009. [23] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in Neural Information Processing Systems, pp. 3856–3866, 2017. [24] M. Michael and W.-C. Lin, “Experimental study of information measure and interintra class distance ratios on feature selection and orderings,” IEEE Transactions on Systems, Man, and Cybernetics, no. 2, pp. 172–181, 1973. [25] Y. Luo, Y. Wong, M. Kankanhalli, and Q. Zhao, “G-softmax: Improving intraclass compactness and interclass separability of features,” IEEE Transactions on Neural Networks and Learning Systems, 2019. [26] Y. Bengio, D.-H. Lee, J. Bornschein, T. Mesnard, and Z. Lin, “Towards biologically plausible deep learning,” arXiv preprint arXiv:1502.04156, 2015. [27] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein, “Training neural networks without gradients: A scalable admm approach,” in International Conference on Machine Learning, pp. 2722–2731, 2016. [28] Z. Huo, B. Gu, Q. Yang, and H. Huang, “Decoupled parallel backpropagation with convergence guarantee.,” in ICML, vol. 80 of Proceedings of Machine Learning Research, pp. 2103–2111, PMLR, 2018. [29] Z. Huo, B. Gu, and H. Huang, “Training neural networks using features replay,” in Advances in Neural Information Processing Systems, pp. 6659–6668, 2018. [30] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using local errors,” Frontiers in Neuroscience, vol. 12, p. 608, 2018. [31] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, p. 788, 1999. [32] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer learning from unlabeled data,” in Proceedings of the 24th International Conference on Machine Learning, pp. 759–766, ACM, 2007. [33] A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,” in Advances in Neural Information Processing Systems, pp. 2528–2536, 2011. [34] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49, 2012.
|