|
Yoshua Bengio. From system 1 deep learning to system 2 deep learning. NeuripS, 2019. Chen-Hsi Chang, Hung-Ting Su, Juiheng Hsu, Yu-Siang Wang, Yu-Cheng Chang, Zhe Yu Liu, Ya-Liang Chang, Wen-Feng Cheng, Ke-Jyun Wang, and Winston H. Hsu. Situation and behavior understanding by trope detection on films. In WWW, 2021. Kuo-HaoZeng,Tseng-HungChen,Ching-YaoChuang,Yuan-HongLiao,JuanCarlos Niebles, and Min Sun. Leveraging video descriptions to learn video question answering. In AAAI, 2017. Dejing Xu, Zhou Zhao, Jun Xiao, Fei Wu, Hanwang Zhang, Xiangnan He, and Yueting Zhuang. Video question answering via gradually refined attention over appearance and motion. In ACM Multimedia, 2017. Zhou Yu, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang, and Dacheng Tao. Activitynet-qa: A dataset for understanding complex web videos via question answering. In AAAI, 2019. Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. Movieqa: Understanding stories in movies through question-answering. In CVPR, pages 4631–4640. IEEE Computer Society, 2016. Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, and Byoung-Tak Zhang. Deepstory: Video story QA by deep embedded memory networks. In IJCAI, 2017. Jie Lei, Licheng Yu, Mohit Bansal, and Tamara Berg. Tvqa: Localized, compositional video question answering. In EMNLP, 2018. Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, and Jingjing Liu. Violin: A large-scale dataset for video-and-language inference. In CVPR, 2020. Thomas Winterbottom, Sarah Xiao, Alistair McLean, and Noura Al Moubayed. On modality bias in the tvqa dataset. In BMVC, 2020. R. Girdhar B. Jasani and D. Ramanan. Are we asking the right questions in MovieQA? In ICCV Workshops, 2019. Jianing Yang, Yuying Zhu, Yongxin Wang, Ruitao Yi, Amir Zadeh, and LouisPhilippe Morency. What gives the answer away? question answering bias analysis on video qa datasets. In Human Multimodal Language Workshop, 2020. Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, and Chuang Gan. Location-aware graph convolutional networks for video question answering. In AAAI, 2020. Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, and Du Tran. Self-supervised learning by cross-modal audio-video clustering. In NeurIPS, 2020. DavidChenandWilliamDolan. Collectinghighlyparalleldataforparaphraseevaluation. In ACL, 2011. Jun Xu, Tao Mei, Ting Yao, and Yong Rui. Msr-vtt: A large video description dataset for bridging video and language. In CVPR, 2016. Michael Heilman and Noah A. Smith. Good question! statistical ranking for question generation. In HLT-NAACL, 2010. JohnR.Smith,DhirajJoshi,BenoitHuet,WinstonHsu,andJozefCota. Harnessing a.i. for augmenting creativity: Application to movie trailer creation. In ACM MM, 2017. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015. Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018. Yusuf Aytar, Carl Vondrick, and Antonio Torralba. Soundnet: Learning sound representationsfromunlabeledvideo. InProceedingsofthe30thInternationalConference on Neural Information Processing Systems, NIPS’16, page 892–900. Curran Associates Inc., 2016. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding, 2019. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, page 91–99. MIT Press, 2015.
|