|
[1] S. Saha, G. Singh, M. Sapienza, P. H. S. Torr, and F. Cuzzolin, \Deep learning for detecting multiple space-time action tubes in videos," in Proceedings of the British Machine Vision Conference, 2016. [2] S. Ren, K. He, R. Girshick, and J. Sun, \Faster R-CNN: Towards real-time object detection with region proposal networks," Neural Information Processing Systems (NIPS), pp. 91{99, 2015. [3] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, \Ssd: Single shot multibox detector," in Proceedings of the European Conference on Computer Vision, pp. 21{37, 2016. [4] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, \You only look once: Unied, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779{788, 2016. [5] G. Gkioxari and J. Malik, \Finding action tubes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 759{768, 2015. [6] P. Weinzaepfel, Z. Harchaoui, and C. Schmid, \Learning to track for spatiotemporal action localization," in Proceedings of the IEEE International Con- ference on Computer Vision, pp. 3164{3172, 2015. [7] G. Yu and J. Yuan, \Fast action proposals for human action detection and search," in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, pp. 1302{1311, 2015. [8] A. Klaser, M. Marsza lek, C. Schmid, and A. Zisserman, \Human focused action localization in video," in Proceedings of the European Conference on Computer Vision, pp. 219{233, 2010. [9] D. Oneata, J. Verbeek, and C. Schmid, \Action and event recognition with sher vectors on a compact feature set," in Proceedings of the IEEE Interna- tional Conference on Computer Vision, pp. 1817{1824, 2013. [10] Y. Tian, R. Sukthankar, and M. Shah, \Spatiotemporal deformable part models for action detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642{2649, 2013. [11] Z. Shu, K. Yun, and D. Samaras, \Action detection with improved dense trajectories and sliding window," in Proceedings of the European Conference on Computer Vision, pp. 541{551, 2014. [12] J. C. van Gemert, M. Jain, E. Gati, and C. G. M. Snoek, \Apt: Action localization proposals from dense trajectories," in Proceedings of the British Machine Vision Conference, pp. 177.1{177.12, 2015. [13] R. Sibson, \Slink: an optimally ecient algorithm for the single-link cluster method," The Computer Journal, pp. 30{34, 1973. [14] P. Mettes, J. C. van Gemert, and C. G. M. Snoek, \Spot on: Action localization from pointly-supervised proposals.," in Proceedings of the European Conference on Computer Vision, pp. 437{453, 2016. [15] L. Wang, Y. Qiao, X. Tang, and L. Van Gool, \Actionness estimation using hybrid fully convolutional networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2708{2717, 2016. [16] E. H. P. Alwando, Y. T. Chen, and W. H. Fang, \Multiple path search for action tube detection in videos," in Proceedings of the IEEE International Conference on Image Processing, 2017. [17] C. Ming-Ming, Z. Zhang, W. Y. Lin, and P. Torr, \Bing: Binarized normed gradients for objectness estimation at 300fps," in Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pp. 3286{3293, 2014. [18] P. Rantalankila, J. Kannala, and E. Rahtu, \Generating object segmentation proposals using global and local search," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2417{2424, 2014. [19] I. Endres and D. Hoiem, \Category-independent object proposals with diverse ranking," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 222{234, 2014. [20] S. Manen, M. Guillaumin, and L. V. Gool, \Prime object proposals with randomized prim's algorithm," in Proceedings of the IEEE International Confer- ence on Computer Vision, pp. 2536{2543, 2013. [21] J. Carreira and C. Sminchisescu, \Cpmc: Automatic object segmentation using constrained parametric min-cuts," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1312{1328, 2012. [22] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders, \Selective search for object recognition," International Journal of Computer Vision, pp. 154{171, 2013. [23] C. L. Zitnick and P. Dollar, \Edge boxes: Locating object proposals from edges," in Proceedings of the European Conference on Computer Vision, pp. 391{405, 2014. [24] H. Wang, A. Klser, C. Schmid, and C.-L. Liu, \Action recognition by dense trajectories," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169{3176, 2011. [25] X. Zhang, Y.-H. Yang, Z. Han, H. Wang, and C. Gao, \Object class detection: A survey," ACM Computing Surveys (CSUR), p. 10, 2013. [26] Y. Li, K. He, and J. Sun, \R-fcn: Object detection via region-based fully convolutional networks," Advances in neural information processing systems, pp. 379{387, 2016. [27] R. B. G. D. M. Felzenszwalb, Pedro F. and D. Ramanan, \Object detection with discriminatively trained part-based models," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1627{1645, 2010. [28] N. Dalal and B. Triggs, \Histograms of oriented gradients for human detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 886{893, 2005. [29] H. Wang, D. Oneata, J. Verbeek, and C. Schmid, \A robust and ecient video representation for action recognition," International Journal of Computer Vi- sion, pp. 219{238, 2006. [30] J. Snchez, F. Perronnin, T. Mensink, and J. Verbeek, \Image classication with the sher vector: Theory and practice," International Journal of Computer Vision, pp. 222{245, 2013. [31] J. Sivic and A. Zisserman, \Video google: A text retrieval approach to object matching in videos," in Proceedings of the IEEE International Conference on Computer Vision, pp. 1470{1477, 2003. [32] D. G. Lowe, \Distinctive image features from scale-invariant keypoints," Inter- national Journal of Computer Vision, pp. 91{110, 2004. [33] S. Lazebnik, C. Schmid, and J. Ponce, \Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169{2178, 2006. [34] T. Ahonen, A. Hadid, and M. Pietikinen, \Face recognition with local binary patterns," in Proceedings of the European Conference on Computer Vision, pp. 469{481, 2004. [35] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \Imagenet classication with deep convolutional neural networks," Advances in neural information processing systems, pp. 1097{1105, 2012. [36] R. Girshick, J. Donahue, T. Darrell, and J. Malik, \Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580{587, 2014. [37] R. Girshick, \Fast R-CNN," in Proceedings of the International Conference on Computer Vision, 2015. [38] K. He, X. Zhang, S. Ren, and J. Sun, \Spatial pyramid pooling in deep convolutional networks for visual recognition," in European Conference on Computer Vision, pp. 346{361, Springer, 2014. [39] M. D. Zeiler and R. Fergus, \Visualizing and understanding convolutional networks," in Proceedings of the European Conference on Computer Vision, pp. 818{833, 2014. [40] K. Simonyan and A. Zisserman, \Very deep convolutional networks for largescale image recognition," arXiv preprint arXiv, pp. 1409{1556, 2014. [41] J. Donahue, Y. Jia, O. Vinyals, J. Homan, N. Zhang, E. Tzeng, and T. Darrell, \Decaf: A deep convolutional activation feature for generic visual recognition," in Proceedings of the International Conference on Machine Learning, pp. 647{ 655, 2014. [42] A. Gaidon, Z. Harchaoui, and C. Schmid, \Temporal localization of actions with actoms," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2782{2795, 2013. [43] I. Laptev and P. Perez, \Retrieving actions in movies," in Proceedings of the IEEE International Conference on Computer Vision, pp. 1{8, 2007. [44] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, \Object detection with discriminatively trained part-based models," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1627{1645, 2010. [45] A. Klaser, M. Marszaek, and C. Schmid, \A spatio-temporal descriptor based on 3d-gradients," in Proceedings of the British Machine Vision Conference, 2008. [46] G. Evangelidis, G. Singh, and R. Horaud, \Continuous gesture recognition from articulated poses," European Conference on Computer Vision Workshops, pp. 595{607, 2014. [47] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, \High accuracy optical ow estimation based on a theory for warping," in Proceedings of the European Conference on Computer Vision, pp. 25{36, Springer, 2004. [48] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, \Going deeper with convolutions," in Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1{9, 2015. [49] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, \Overfeat: Integrated recognition, localization and detection using convolutional networks," in Proceedings of the International Conference on Learning Representations, 2014. [50] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, \Scalable object detection using deep neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147{2154, 2014. [51] K. W. Cheng, Y. T. Chen, and W. H. Fang, \Improved object detection with iterative localization renement in convolutional neural networks," in Proceed- ings of the IEEE International Conference on Image Processing, pp. 3643{3647, 2016. [52] S. Gidaris and N. Komodakis, \Locnet: Improving localization accuracy for object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 789{798, 2016. [53] K. Simonyan and A. Zisserman, \Two-stream convolutional networks for action recognition in videos," Advances in neural information processing systems, pp. 568{576, 2014. [54] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, \Backpropagation applied to handwritten zip code recognition," Neural computation, pp. 541{551, 1989. [55] D. Comaniciu and P. Meer, \Mean shift: A robust approach toward feature space analysis," IEEE Transactions on Pattern Analysis and Machine Intelli- gence, pp. 603{619, 2002. [56] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, \The pascal visual object classes challenge 2007 (voc2007) results." http://www.pascalnetwork. org/challenges/VOC/voc2007/workshop/index.html, 2007. [57] E. K. Chong and S. H. Zak, An Introduction to Optimization. John Wiley & Sons, 2013. [58] K. Soomro, A. R. Zamir, and M. Shah, \Ucf101: A dataset of 101 human actions classes from videos in the wild," CRCV-TR-12-01, 2012. [59] H. Jhuang, J. Gall, S. Zu, C. Schmid, and M. J. Black, \Towards understanding action recognition," in Proceedings of the IEEE Conference on on Computer Vision, pp. 3192{3199, 2013.
|