|
[1] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9, 1735-1780. [2] Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473. [3] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. ArXiv, abs/1706.03762. [4] Sukhbaatar, S., Szlam, A., Weston, J., & Fergus, R. (2015). End-To-End Memory Networks. NIPS. [5] Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-Attention with Relative Position Representations. NAACL-HLT. [6] Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014). Category-Specific Video Summarization. ECCV. [7] Gygli, M., Grabner, H., Riemenschneider, H., & Gool, L.V. (2014). Creating Summaries from User Videos. ECCV. [8] Song, Y., Vallmitjana, J., Stent, A., & Jaimes, A. (2015). TVSum: Summarizing web videos using titles. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5179-5187. [9] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-9. [10] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556. [11] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778. [12] Hara, K., Kataoka, H., & Satoh, Y. (2017). Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), 3154-3160. [13] Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. CVPR 2009. [14] Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, A., Suleyman, M., & Zisserman, A. (2017). The Kinetics Human Action Video Dataset. ArXiv, abs/1705.06950. [15] Zhang, K., Chao, W., Sha, F., & Grauman, K. (2016). Video Summarization with Long Short-Term Memory. ArXiv, abs/1605.08110. [16] Rochan, M., Ye, L., & Wang, Y. (2018). Video Summarization Using Fully Convolutional Sequence Networks. ECCV. [17] Zhao, B., Li, X., & Lu, X. (2017). Hierarchical Recurrent Neural Network for Video Summarization. Proceedings of the 25th ACM international conference on Multimedia. [18] Zhao, B., Li, X., & Lu, X. (2018). HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7405-7414. [19] Zhang, K., Grauman, K., & Sha, F. (2018). Retrospective Encoders for Video Summarization. ECCV. [20] Ji, Z., Xiong, K., Pang, Y., & Li, X. (2020). Video Summarization With Attention-Based Encoder–Decoder Networks. IEEE Transactions on Circuits and Systems for Video Technology, 30, 1709-1717. [21] Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., & Remagnino, P. (2018). Summarizing Videos with Attention. ACCV Workshops. [22] Agyeman, R., Muhammad, R., & Choi, G.S. (2019). Soccer Video Summarization Using Deep Learning. 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 270-273. [23] Zhou, K., & Qiao, Y. (2018). Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. ArXiv, abs/1801.00054. [24] Mahasseni, B., Lam, M., & Todorovic, S. (2017). Unsupervised Video Summarization with Adversarial LSTM Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2982-2991. [25] Fu, T., Tai, S., & Chen, H. (2019). Attentive and Adversarial Learning for Video Summarization. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 1579-1587. [26] Liu, Y., Li, Y., Yang, F., Chen, S., & Wang, Y.F. (2019). Learning Hierarchical Self-Attention for Video Summarization. 2019 IEEE International Conference on Image Processing (ICIP), 3377-3381. [27] Feng, L., Li, Z., Kuang, Z., & Zhang, W. (2018). Extractive Video Summarizer with Memory Augmented Neural Networks. Proceedings of the 26th ACM international conference on Multimedia. [28] Sahrawat, D., Agarwal, M., Sinha, S., Adhikary, A., Agarwal, M., Shah, R.R., & Zimmermann, R. (2019). Video Summarization using Global Attention with Memory Network and LSTM. 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), 231-236. [29] Ba, J., Kiros, J.R., & Hinton, G.E. (2016). Layer Normalization. ArXiv, abs/1607.06450. [30] Kingma, D.P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980. [31] Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. AISTATS. [32] Aubry, M., & Russell, B.C. (2015). Understanding Deep Features with Computer-Generated Imagery. 2015 IEEE International Conference on Computer Vision (ICCV), 2875-2883. [33] Otani, M., Nakashima, Y., Rahtu, E., & Heikkilä, J. (2019). Rethinking the Evaluation of Video Summaries. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7588-7596.
|