|
Ba, J. and Caruana, R. (2014). Do deep nets really need to be deep? In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27, pages 2654– 2662. Curran Associates, Inc. Courbariaux, M., Bengio, Y., and David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 3123–3131. Curran Associates, Inc. Du, S. S., Lee, J. D., Li, H., Wang, L., and Zhai, X. (2019). Gradient descent finds global minima of deep neural networks. International Conference on Machine Learning, abs/1811.03804. Falkner, S., Klein, A., and Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1437–1446, Stockholmsmässan, Stockholm Sweden. PMLR. Frankle, J. and Carbin, M. (2019). The lottery ticket hypothesis: Training pruned neural networks. International Conference on Learning Representations, abs/1803.03635 Gale, T., Elsen, E., and Hooker, S. (2019). The state of sparsity in deep neural networks. ArXiv:cs.LG/1902.09574. Gao, Y., Beijbom, O., Zhang, N., and Darrell, T. (2015). Compact bilinear pooling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, abs/1511.06062. Haeffele, B. D. and Vidal, R. (2017). Global optimality in neural network training. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4390– 4398. Han, S., Mao, H., and Dally, W. J. (2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Bengio, Y. and LeCun, Y., editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. Han, S., Pool, J., Tran, J., and Dally, W. J. (2015). Learning both weights and connections for efficient neural networks. Advances in Neural Information Processing Systems, abs/1506.02626. Hassibi, B., Stork, D. G., and Wolff, G. J. (1993). Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks, pages 293–299 vol.1. He, Y. and Han, S. (2018). ADC: automated deep compression and acceleration with reinforcement learning. European Conference on Computer Vision, abs/ 1802.03494 He, Y., Kang, G., Dong, X., Fu, Y., and Yang, Y. (2018). Soft filter pruning for accelerating deep convolutional neural networks. International Joint Conferences on Artificial Intelligence, abs/1808.06866. He, Y., Zhang, X., and Sun, J. (2017). Channel pruning for accelerating very deep neural networks. International Conference on Computer Vision, abs/ 1707.06168. Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. ArXiv:1704.04861. Hsu, Y.-C., Hong, C.-Y., Chen, D.-J., Lee, M.-S., Geiger, D., and Liu, T.-L. (2019). Fine-grained visual recognition with batch confusion norm. ArXiv:1910.12423. Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, abs/1502.03167. Jaderberg, M., Vedaldi, A., and Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. BMVC 2014 - Proceedings of the British Machine Vision Conference 2014. Kar, P. and Karnick, H. (2012). Random feature maps for dot product kernels. In Lawrence, N. D. and Girolami, M., editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 583–591, La Palma, Canary Islands. PMLR Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. (2017). On large-batch training for deep learning: Generalization gap and sharp minima. International Conference on Learning Representations, abs/ 1609.04836. Kong, S. and Fowlkes, C. C. (2016). Low-rank bilinear pooling for fine-grained classification. Conference on Computer Vision and Pattern Recognition, abs/ 1611.05109. Krause, J., Stark, M., Deng, J., and Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Pereira, F., Burges, C. J. C., Bottou, L., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc. LeCun, Y., Denker, J. S., and Solla, S. A. (1990). Optimal brain damage. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 598–605. Morgan-Kaufmann. Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. (2016). Pruning filters for efficient convnets. International Conference on Learning Representations, abs/1608.08710. Lin, T., RoyChowdhury, A., and Maji, S. (2015). Bilinear cnn models for finegrained visual recognition. In IEEE International Conference on Computer Vision, pages 1449–1457 Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., and Zhang, C. (2017). Learning efficient convolutional networks through network slimming. IEEE International Conference on Computer Vision, abs/1708.06519. Liu, Z., Sun, M., Zhou, T., Huang, G., and Darrell, T. (2018). Rethinking the value of network pruning. International Conference on Learning Representations, abs/1810.05270. Luo, J., Wu, J., and Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. International Conference on Computer Vision, abs/1707.06342. Maji, S., Kannala, J., Rahtu, E., Blaschko, M., and Vedaldi, A. (2013). Finegrained visual classification of aircraft. Technical Report. Molchanov, P., Tyree, S., Karras, T., Aila, T., and Kautz, J. (2016). Pruning convolutional neural networks for resource efficient transfer learning. International Conference on Learning Representations, abs/1611.06440. NervanaSystems (2019). Distiller: Pruning filters and channels. Accessed: 2020- 05-15. Oguntola, I., Olubeko, S., and Sweeney, C. (2018). Slimnets: An exploration of deep model compression and acceleration. IEEE High Performance extreme Computing Conference, abs/1808.00496. Pham, N. and Pagh, R. (2013). Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’ 13, page 239– 247, New York, NY, USA. Association for Computing Machinery Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Bengio, Y. and LeCun, Y., editors, 3rd International Conference on Learning Representations, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, (CNS-TR-2011-001). Wei, X., Zhang, Y., Gong, Y., Zhang, J., and Zheng, N. (2018). Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In The European Conference on Computer Vision. Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. (2016). Learning structured sparsity in deep neural networks. Advances in Neural Information Processing Systems, abs/1608.03665. Yeom, S.-K., Seegerer, P., Lapuschkin, S., Wiedemann, S., Müller, K.-R., and Samek, W. (2019). Pruning by explaining: A novel criterion for deep neural network pruning. ArXiv:1912.08881. Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks? Neural Information Processing Systems, abs/ 1411.1792. You, K., Long, M., Wang, J., and Jordan, M. I. (2019). How does learning rate decay help modern neural networks? ArXiv:1908.01878. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68:49–67. Zhu, M. and Gupta, S. (2018). To prune, or not to prune: exploring the efficacy of pruning for model compression. International Conference on Learning Representations Workshop Track, abs/1710.01878. Zoph, B. and Le, Q. V. (2016). Neural architecture search with reinforcement learning. International Conference on Learning Representations, abs/ 1611.01578.
|