|
[1] D. Lian, Z. Yu, and S. Gao, “Believe it or not, we know what you are looking at!,” in ACCV, 2018. [2] E. Chong, Y. Wang, N. Ruiz, and J. M. Rehg, “Detecting attended visual targets in video,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. [3] E. Chong, N. Ruiz, Y. Wang, Y. Zhang, A. Rozga, and J. M. Rehg, “Connecting gaze, scene, and attention: Generalized attention estimation via joint modeling of gaze and scene saliency,” in The European Conference on Computer Vision (ECCV), September 2018. [4] P. A. Dias, D. Malafronte, H. Medeiros, and F. Odone, “Gaze estimation for assisted living environments,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), March 2020. [5] X. Zhong, X. Qu, C. Ding, and D. Tao, “Glance and gaze: Inferring action-aware points for one-stage human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13234–13243, June 2021. [6] H. Tomas, M. Reyes, R. Dionido, M. Ty, J. Casimiro, R. Atienza, and R. Guinto, “Goo: A dataset for gaze object prediction in retail environments,” in CVPR Workshops (CVPRW), 2021. [7] K. Campbell, K. L. H. Carpenter, J. Hashemi, S. Espinosa, S. Marsan, J. S. Borg, Z. Chang, Q. Qiu, S. Vermeer, E. Adler, M. Tepper, H. L. Egger, J. P. Baker, G. Sapiro, and G. Dawson, “Computer vision analysis captures atypical attention in toddlers with autism.,” Autism, vol. 23, no. 3, pp. 619–628, 2019. [8] A. Recasens, A. Khosla, C. Vondrick, and A. Torralba, “Where are they looking?,” in Advances in Neural Information Processing Systems (C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds.), vol. 28, Curran Associates, Inc., 2015. [9] Y. Fang, J. Tang, W. Shen, W. Shen, X. Gu, L. Song, and G. Zhai, “Dual attention guided gaze target detection in the wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11390–11399, June 2021. [10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385, 2015. [11] P. Kellnhofer, A. Recasens, S. Stent, W. Matusik, and A. Torralba, “Gaze360: Physically unconstrained gaze estimation in the wild,” in IEEE International Conference on Computer Vision (ICCV), October 2019. [12] A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks),” in International Conference on Computer Vision, 2017. [13] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020. [14] Y. Cheng and F. Lu, “Gaze estimation using transformer,” 2021. [15] R. Siegfried and J.-M. Odobez, “Visual focus of attention estimation in 3d scene with an arbitrary number of targets,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 3153–3161, June 2021. [16] S. Ghosh, M. Hayat, A. Dhall, and J. Knibbe, “Mtgls: Multi-task gaze estimation with limited supervision,” 2021. [17] Y. Cheng, S. Huang, F. Wang, C. Qian, and F. Lu, “A coarse-to-fine adaptive network for appearance-based gaze estimation,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 10623–10630, 04 2020. [18] M. L. R. D and P. Biswas, “Appearance-based gaze estimation using attention and difference mechanism,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 3143–3152, June 2021. [19] N. Liu, N. Zhang, K. Wan, L. Shao, and J. Han, “Visual saliency transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4722–4732, October 2021. [20] P. Sun, W. Zhang, H. Wang, S. Li, and X. Li, “Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1407–1417, June 2021. [21] S. Gorji and J. J. Clark, “Attentional push: A deep convolutional network for augmenting image salience with shared attention modeling in social scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [22] D. Parks, A. Borji, and L. Itti, “Augmented saliency model using automatic 3d head pose detection and learned gaze following in natural scenes,” Vision Research, vol. 116, pp. 113–126, 2015. Computational Models of Visual Attention. [23] Z. Bylinskii, A. Recasens, A. Borji, A. Oliva, A. Torralba, and F. Durand, “Where should saliency models look next?,” in Computer Vision – ECCV 2016 (B. Leibe, J. Matas, N. Sebe, and M. Welling, eds.), (Cham), pp. 809–824, Springer International Publishing, 2016. [24] R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” ArXiv preprint, 2021. [25] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” 2019. [26] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in AISTATS, 2010 [27] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32 (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, eds.), pp. 8024–8035, Curran Associates, Inc., 2019. [28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 200
|