|
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no.7553, p. 436, 2015. [2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012. [3] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Computational intelligence and neuroscience, vol. 2018, 2018. [4] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in neural information processing systems, pp. 568–576, 2014. [5] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2625–2634, 2015. [6] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, 2015. [7] K. Lee and D. P. Ellis, “Audio-based semantic concept classification for consumer video,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1406–1416, 2009. [8] M. Xu, N. C. Maddage, C. Xu, M. Kankanhalli, and Q. Tian, “Creating audio keywords for event detection in soccer video,” in 2003 International Conference on Multimedia and Expo. ICME’03. Proceedings (Cat. No. 03TH8698), vol. 2, pp. II–281, IEEE, 2003. [9] J. Cao, T. Zhao, J. Wang, R. Wang, and Y. Chen, “Excavation equipment classification based on improved mfcc features and elm,” Neurocomputing, vol. 261, pp. 231–241, 2017. [10] I. Hong, Y. Ko, H. Shin, and Y. Kim, “Emotion recognition from korean language using mfcc hmm and speech speed,” in The 12th International Conference on Multimedia Information Technology and Applications (MITA2016), pp. 12–15, 2016. [11] C. T. Duong, R. Lebret, and K. Aberer, “Multimodal classification for analysing social media,” arXiv preprint arXiv:1708.02099, 2017. [12] T. Hasan, H. Bořil, A. Sangwan, and J. H. Hansen, “Multi-modal highlight generation for sports videos using an information-theoretic excitability measure,” EURASIP Journal on Advances in Signal Processing, vol. 2013, no. 1, p. 173, 2013. [13] U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury, “A survey of decision fusion and feature fusion strategies for pattern classification,” IETE Technical review, vol. 27, no. 4, pp. 293–307, 2010. [14] E. Podrug and A. Subasi, “Surface emg pattern recognition by using dwt feature extraction and svm classifier,” in The 1st Conference of Medical and Biological Engineering in Bosnia and Herzegovina (CMBEBIH 2015), pp. 13–15, 2015. [15] D. M. Vo and T. H. Le, “Deep generic features and svm for facial expression recognition,” in 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 80–84, IEEE, 2016. [16] J. Huang, G. Li, Q. Huang, and X. Wu, “Learning label specific features for multi-label classification,” in 2015 IEEE International Conference on Data Mining, pp. 181–190, IEEE, 2015. [17] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732, 2014. [18] H. Harb and L. Chen, “Highlights detection in sports videos based on audio analysis,” in Proceedings of the Third International Workshop on Content-Based Multimedia Indexing CBMI03, September, pp. 22–24, 2003. [19] M.-L. Zhang, Y.-K. Li, X.-Y. Liu, and X. Geng, “Binary relevance for multi-label learning: an overview,” Frontiers of Computer Science, vol. 12, no. 2, pp. 191–202, 2018.
|