|
A. J. Piergiovanni and M. S. Ryoo, “Fine-grained activity recognition in baseball videos,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2018-June, pp. 1821–1830, 2018. G. Kanojia, S. Kumawat, and S. Raman, “Attentive spatio-temporal representation learning for diving classification,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2019-June, pp. K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild,” no. November, 2012. [Online]. Available: http://arxiv.org/abs/1212.0402 H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb51: A large video database for human motion recognition,” 11 2011, pp. 2556–2563. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, “The Kinetics Human Action Video Dataset,” 2017. [Online]. Available: http://arxiv.org/abs/1705.06950 G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta, “Hollywood in homes: Crowdsourcing data collection for activity understanding,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 510–526, 2016. Y. Li, Y. Li, and N. Vasconcelos, “RESOUND: Towards action recognition without representation bias,” Lecture Notes in Computer Science (includ- ing subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11210 LNCS, pp. 520–535, 2018. H. Wang, A. Kla¨ser, C. Schmid, and C. Liu, “Action recognition by dense trajectories,” in CVPR 2011, 2011, pp. 3169–3176. H. Wang and C. Schmid, “Action recognition with improved trajectories,” Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558, 2013. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Neural Information Processing Systems, vol. 25, 01 2012. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14, 2015. J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classi- fication,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. A. J. Piergiovanni, C. Fan, and M. S. Ryoo, “Title learning latent subevents in activity videos using temporal attention filters,” in AAAI, 2017. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 4489–4497, 2015. J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4724–4733. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
|