|
[1] P. Anderson, B. Fernando, M. Johnson, and S. Gould. SPICE: semantic propositionalimage caption evaluation. InECCV, 2016. [2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang.Bottom-up and top-down attention for image captioning and visual question answer-ing. InCVPR, 2018. [3] Y. Bengio and Y. LeCun, editors.3rd International Conference on Learning Rep-resentations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference TrackProceedings, 2015. [4] M. Chatterjee and A. G. Schwing. Diverse and coherent paragraph generation fromimages. InECCV, 2018. [5] X. Chen and C. L. Zitnick. Mind’s eye: A recurrent visual representation for imagecaption generation. InCVPR, 2015. [6] K. Clark and C. D. Manning. Deep reinforcement learning for mention-ranking coref-erence models. InEMNLP, 2016. [7] Y. Cui, G. Yang, A. Veit, X. Huang, and S. J. Belongie. Learning to evaluate imagecaptioning. InCVPR, 2018. [8] B. Dai, S. Fidler, R. Urtasun, and D. Lin. Towards diverse and natural imagedescriptions via a conditional GAN. InICCV, 2017. [9] A. Farhadi, S. M. M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hocken-maier, and D. A. Forsyth. Every picture tells a story: Generating sentences fromimages. InECCV, 2010. [10] M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a rankingtask: Data, models and evaluation metrics (extended abstract). InProceedings ofthe Twenty-Fourth International Joint Conference on Artificial Intel ligence, IJCAI2015, Buenos Aires, Argentina, July 25-31, 2015, 2015. [11] M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a rankingtask: Data, models and evaluation metrics (extended abstract). InIJCAI, 2015. [12] J. Johnson, A. Gupta, and L. Fei-Fei. Image generation from scene graphs. InCVPR,2018. [13] J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localizationnetworks for dense captioning. InCVPR, 2016. [14] J. Johnson, R. Krishna, M. Stark, L. Li, D. A. Shamma, M. S. Bernstein, and F. Li.Image retrieval using scene graphs. InCVPR, 2015. [15] A. Karpathy, A. Joulin, and F. Li. Deep fragment embeddings for bidirectional imagesentence mapping. InNIPS, 2014. [16] A. Karpathy and F. Li. Deep visual-semantic alignments for generating image de-scriptions. InCVPR, 2015. [17] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. InICLR,2015. [18] J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei. A hierarchical approach for gen-erating descriptive image paragraphs. InCVPR, 2017. [19] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalan-tidis, L. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. Visual genome: Connect-ing language and vision using crowdsourced dense image annotations.InternationalJournal of Computer Vision, 123(1):32–73, 2017. [20] X. Liang, Z. Hu, H. Zhang, C. Gan, and E. P. Xing. Recurrent topic-transition GANfor visual paragraph generation. InICCV, 2017. [21] X. Liang, L. Lee, and E. P. Xing. Deep variation-structured reinforcement learningfor visual relationship and attribute detection. InCVPR, 2017. [22] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ́ar, andC. L. Zitnick. Microsoft COCO: common objects in context. InECCV, 2014. [23] J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attentionvia a visual sentinel for image captioning. InCVPR, 2017. [24] J. Lu, J. Yang, D. Batra, and D. Parikh. Neural baby talk. InCVPR, 2018. [25] D. Marr. Vision: A computational investigation into the human representation andprocessing of visual information. 1982. [26] D. Teney, L. Liu, and A. van den Hengel. Graph-structured representations for visualquestion answering. InCVPR, 2017. [27] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural imagecaption generator. InCVPR, 2015. [28] Y. Wang, C. Liu, X. Zeng, and A. L. Yuille. Scene graph parsing as dependencyparsing. InNAACL, 2018. [29] S. Woo, D. Kim, D. Cho, and I. S. Kweon. Linknet: Relational embedding for scenegraph. InNIPS, 2018. [30] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei. Scene graph generation by iterativemessage passing. InCVPR, 2017. [31] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visualattention. InICML, 2015. [32] J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh. Graph R-CNN for scene graphgeneration. InECCV, 2018. [33] X. Yang, K. Tang, H. Zhang, and J. Cai. Auto-encoding scene graphs for imagecaptioning.CoRR, abs/1812.02378, 2018. [34] Z. Yang, Y. Yuan, Y. Wu, W. W. Cohen, and R. Salakhutdinov. Review networksfor caption generation. InNIPS, 2016. [35] T. Yao, Y. Pan, Y. Li, and T. Mei. Exploring visual relationship for image captioning.InECCV, 2018. [36] Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semanticattention. InCVPR, 2016. [37] R. Zellers, M. Yatskar, S. Thomson, and Y. Choi. Neural motifs: Scene graph parsingwith global context. InCVPR, 2018. [38] H. Zhang, Z. Kyaw, S. Chang, and T. Chua. Cvpr. 2017. [39] Z. Zhu, Z. Xue, and Z. Yuan. Topic-guided attention for image captioning. InICIP,2018.
|