|
References [1] P. Isola, J. J. Lim, and E. H. Adelson, “Discovering states and transformations in image collections,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1383–1391. [2] A. Yu and K. Grauman, “Fine-grained visual comparisons with local learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 192–199. [3] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, pp. 123–140, 1996. [4] Y. Freund, R. E. Schapire et al., “Experiments with a new boosting algorithm,” in icml, vol. 96. Citeseer, 1996, pp. 148–156. [5] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997. [6] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991. [7] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural computation, vol. 14, no. 8, pp. 1771–1800, 2002. [8] R. Polikar, “Ensemble learning,” Ensemble machine learning: Methods and applications, pp. 1–34, 2012. [9] E. S. Aimar, A. Jonnarth, M. Felsberg, and M. Kuhlmann, “Balanced product of experts for long-tailed recognition,” arXiv preprint arXiv:2206.05260, 2022. [10] D. D. Hoffman and W. A. Richards, “Parts of recognition,” Cognition, vol. 18, no. 1-3, pp. 65–96, 1984. 44 [11] C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 951–958. [12] I. Misra, A. Gupta, and M. Hebert, “From red wine to red tomato: Composition with context,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1792–1801. [13] N. Saini, K. Pham, and A. Shrivastava, “Disentangling visual embeddings for attributes and objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 658–13 667. [14] S. Kumar, A. Iftekhar, E. Prashnani, and B. Manjunath, “Locl: Learning object-attribute composition using localization,” arXiv preprint arXiv:2210.03780, 2022. [15] X. Li, X. Yang, K. Wei, C. Deng, and M. Yang, “Siamese contrastive embedding network for compositional zero-shot learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9326–9335. [16] X. Lu, S. Guo, Z. Liu, and J. Guo, “Decomposed soft prompt guided fusion enhancing for compositional zero-shot learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 560–23 569. [17] C. Wang, C. Lawrence, and M. Niepert, “Uncertainty estimation and calibration with finitestate probabilistic rnns,” arXiv preprint arXiv:2011.12010, 2020. [18] M. Huang and Y. Qiao, “Uncertainty-estimation with normalized logits for out-ofdistribution detection,” in International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2023), vol. 12645. SPIE, 2023, pp. 524–530. [19] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2722–2730. 45 [20] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” nature, vol. 542, no. 7639, pp. 115–118, 2017. [21] M.-H. Laves, S. Ihler, K.-P. Kortmann, and T. Ortmaier, “Uncertainty calibration error: A new metric for multi-class classification,” 2020. [22] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748– 8763. [23] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
|