[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016. [2] Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu, “Large-scale long-tailed recognition in an open world,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2537–2546, 2019. [3] G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie, “The inaturalist species classification and detection dataset,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778, 2018. [4] C. Hou, J. Zhang, H. Wang, and T. Zhou, “Subclass-balancing contrastive learning for long-tailed recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5395–5407, 2023. [5] C. Huang, Y. Li, C. C. Loy, and X. Tang, “Learning deep representation for imbalanced classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5375–5384, 2016. [6] C. Park, J. Yim, and E. Jun, “Mutual learning for long-tailed recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2675–2684, 2023. [7] Y. Jin, M. Li, Y. Lu, Y.-m. Cheung, and H. Wang, “Long-tailed visual recognition via selfheterogeneous integration with knowledge excavation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pp. 23695–23704, 2023. [8] J. Li, Z. Tan, J. Wan, Z. Lei, and G. Guo, “Nested collaborative learning for long-tailed visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pp. 6949–6958, 2022. [9] J. Ren, C. Yu, X. Ma, H. Zhao, S. Yi, et al., “Balanced meta-softmax for long-tailed visual recognition,” Advances in neural information processing systems, vol. 33, pp. 4175–4186, 2020. [10] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015. [11] Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep mutual learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4320–4328, 2018. [12] S. Zhang, C. Chen, X. Hu, and S. Peng, “Balanced knowledge distillation for long-tailed learning,” Neurocomputing, vol. 527, pp. 36–46, 2023. [13] Y.-Y. He, J. Wu, and X.-S. Wei, “Distilling virtual examples for long-tailed recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 235–244, 2021. [14] H. Dang, T. Nguyen, T. Tran, H. Tran, and N. Ho, “Neural collapse in deep linear network: From balanced to imbalanced data,” arXiv preprint arXiv:2301.00437, 2023. [15] L. Xie, Y. Yang, D. Cai, and X. He, “Neural collapse inspired attraction–repulsionbalanced loss for imbalanced learning,” Neurocomputing, vol. 527, pp. 60–70, 2023. [16] Y. Yang, S. Chen, X. Li, L. Xie, Z. Lin, and D. Tao, “Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network?,” Advances in Neural Information Processing Systems, vol. 35, pp. 37991–38002, 2022. [17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. [18] S. Zhang, Z. Li, S. Yan, X. He, and J. Sun, “Distribution alignment: A unified framework for long-tail visual recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2361–2370, 2021. [19] J. Cui, Z. Zhong, S. Liu, B. Yu, and J. Jia, “Parametric contrastive learning,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 715–724, 2021. [20] E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 702–703, 2020. [21] X. Wang, L. Lian, Z. Miao, Z. Liu, and S. X. Yu, “Long-tailed recognition by routing diverse distribution-aware experts,” arXiv preprint arXiv:2010.01809, 2020. [22] J. Cai, Y. Wang, and J.-N. Hwang, “Ace: Ally complementary experts for solving longtailed recognition in one-shot,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 112–121, 2021. [23] Q. Zhao, C. Jiang, W. Hu, F. Zhang, and J. Liu, “Mdcs: More diverse experts with consistency self-distillation for long-tailed recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11597–11608, October 2023. [24] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018. [25] T. Li, P. Cao, Y. Yuan, L. Fan, Y. Yang, R. S. Feris, P. Indyk, and D. Katabi, “Targeted supervised contrastive learning for long-tailed recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6918–6928, 2022. [26] M. Li, Y.-M. Cheung, and J. Jiang, “Feature-balanced loss for long-tailed visual recognition,” in 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2022. [27] K. Tang, M. Tao, J. Qi, Z. Liu, and H. Zhang, “Invariant feature learning for generalized long-tailed classification,” in European Conference on Computer Vision, pp. 709–726, Springer, 2022. [28] E. S. Aimar, A. Jonnarth, M. Felsberg, and M. Kuhlmann, “Balanced product of calibrated experts for long-tailed recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19967–19977, 2023. [29] Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019. [30] M. Li, Y.-m. Cheung, and Y. Lu, “Long-tailed visual recognition via gaussian clouded logit adjustment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6929–6938, 2022. [31] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017. [32] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural networks, vol. 106, pp. 249–259, 2018. [33] J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, and D. Lin, “Seesaw loss for long-tailed instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9695–9704, 2021. [34] K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label-distribution-aware margin loss,” Advances in neural information processing systems, vol. 32, 2019. [35] Z. Zhong, J. Cui, S. Liu, and J. Jia, “Improving calibration for long-tailed recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16489–16498, 2021. [36] B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9719–9728, 2020. [37] B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling representation and classifier for long-tailed recognition,” arXiv preprint arXiv:1910.09217, 2019. [38] L. Xiang, G. Ding, and J. Han, “Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pp. 247–263, Springer, 2020. [39] Y. Zhang, B. Hooi, L. Hong, and J. Feng, “Self-supervised aggregation of diverse experts for test-agnostic long-tailed recognition,” Advances in Neural Information Processing Systems, vol. 35, pp. 34077–34090, 2022. [40] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018. [41] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514, 2018. [42] H. Wang, Y. Fan, Z. Wang, L. Jiao, and B. Schiele, “Parameter-free spatial attention network for person re-identification,” arXiv preprint arXiv:1811.12150, 2018. [43] C. Buciluǎ, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 535–541, 2006. [44] T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born again neural networks,” in International Conference on Machine Learning, pp. 1607–1616, PMLR, 2018. [45] S. I. Mirzadeh, M. Farajtabar, A. Li, N. Levine, A. Matsukawa, and H. Ghasemzadeh, “Improved knowledge distillation via teacher assistant,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 5191–5198, 2020. [46] Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo, “Online knowledge distillation via collaborative learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11020–11029, 2020. [47] C. Yang, Z. An, H. Zhou, F. Zhuang, Y. Xu, and Q. Zhang, “Online knowledge distillation via mutual contrastive learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. [48] R. Anil, G. Pereyra, A. Passos, R. Ormandi, G. E. Dahl, and G. E. Hinton, “Large scale distributed neural network training through online distillation,” arXiv preprint arXiv:1804.03235, 2018.