跳到主要內容

臺灣博碩士論文加值系統

(44.222.131.239) 您好!臺灣時間:2024/09/08 15:06
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:何嘉斌
研究生(外文):HE, JIABIN
論文名稱:多模態及多標籤的運動精彩片段擷取
論文名稱(外文):Multi-modal, Multi-labeled Sport Highlight Extraction
指導教授:鮑興國鮑興國引用關係
指導教授(外文):Hsing-Kuo Pao
口試委員:李育杰項天瑞
口試委員(外文):Yuh-Jye LeeTien-Ruey Hsiang
口試日期:2019-06-27
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:52
外文關鍵詞:multi-modal learningmulti-label learningfusion strategyfeature representationvideo classification
相關次數:
  • 被引用被引用:0
  • 點閱點閱:127
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
科技的發展使多媒體的產生與傳播更加便捷與快速,互聯網上的視頻內容也是與日俱增。如何在龐大的視頻資源中有效地搜尋我們需要的視頻以及在冗長的視頻中快速獲取我們需要的內容,是電腦視覺與視頻理解的重要研究方向。籃球是一項受到廣泛熱愛的運動,因此關於籃球比賽的視頻也不勝枚舉,我們想要辨識籃球比賽視頻中的精彩片段,那麼就可以大大節省觀看視頻的時間,與此同時觀看者也享受到了同樣的樂趣。
籃球比賽的視頻含有很多模態的資訊,比如圖像、聲音、比分以及比賽時間等等,對於分析不同模態的資料已經發展出相應的不同演算法來解決。但我們希望通過融合視頻多模態的特徵,利用更全面和豐富的訊息來更好地辨識籃球精彩片段,對此我們探究了不同模態資料的融合策略。
此外我們還基於會影響精彩程度的因素訓練多標籤模型來提取這些因素的聯合特徵,加入到基於多模態的模型中以至於進一步提升模型效果,我們把它稱作基於多模態多標籤的分類方法。
The development of technology makes the generation and dissemination of multimedia more convenient and fast, and the video on the Internet is also growing with each passing day. How to effectively search for the video we need in a huge video resource and quickly gain the content we need in a lengthy video are important research directions of computer vision and video understanding. Basketball is a widely loved sport, so the video about basketball games is too numerous to enumerate. We want to recognize the highlights in the basketball game video, then viewers can save a lot of time watching videos, while enjoy the same pleasure.
The video of basketball game contains a variety of information such as images, audios, scores and game time, etc. Different algorithms have been developed to analysis different modals data. However, we hope to combine multi-modal characteristics then using more comprehensive and rich information to recognize the basketball highlights. We explore the different fusion strategies, latent features fusion and early features fusion.
In addition, we also train multi-label model based on the factors that affect the highlights to extract the joint features of these factors, and add them to the multi-modal model to further improve the model performance. We call this method multi-modal multi-label based classification.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Visual-based Classification . . . . . . . . . . . . . . . . . 6
3.1.1 Multi-branch Convolutional Networks . . . . . . . 7
3.1.2 Multi-channel Convolutional Networks . . . . . . 8
3.1.3 3D Convolutional Networks . . . . . . . . . . . . 9
3.1.4 Long-term Recurrent Convolutional Networks . . 11
3.2 Audio-based Classification . . . . . . . . . . . . . . . . . 12
3.3 Multi-modal based Classification . . . . . . . . . . . . . . 14
3.3.1 Latent Features Fusion . . . . . . . . . . . . . . . 15
3.3.2 Early Features Fusion . . . . . . . . . . . . . . . 15
3.4 Multi-modal Multi-label based Classification . . . . . . . 16
4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . 23
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.1 Data Collection . . . . . . . . . . . . . . . . . . . 24
4.1.2 Data Preprocessing . . . . . . . . . . . . . . . . . 24
4.2 Unimodal Model . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Multi-modal Model . . . . . . . . . . . . . . . . . . . . . 29
4.4 Multi-modal Multi-label Model . . . . . . . . . . . . . . . 31
4.5 Highlight Extraction and Evaluation . . . . . . . . . . . . 35
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no.7553, p. 436, 2015.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[3] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Computational intelligence and neuroscience, vol. 2018, 2018.
[4] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in neural information processing systems, pp. 568–576, 2014.
[5] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2625–2634, 2015.
[6] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, 2015.
[7] K. Lee and D. P. Ellis, “Audio-based semantic concept classification for consumer video,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1406–1416, 2009.
[8] M. Xu, N. C. Maddage, C. Xu, M. Kankanhalli, and Q. Tian, “Creating audio keywords for event detection in soccer video,” in 2003 International Conference on Multimedia and Expo. ICME’03. Proceedings (Cat. No. 03TH8698), vol. 2, pp. II–281, IEEE, 2003.
[9] J. Cao, T. Zhao, J. Wang, R. Wang, and Y. Chen, “Excavation equipment classification based on improved mfcc features and elm,” Neurocomputing, vol. 261, pp. 231–241, 2017.
[10] I. Hong, Y. Ko, H. Shin, and Y. Kim, “Emotion recognition from korean language using mfcc hmm and speech speed,” in The 12th International Conference on Multimedia Information Technology and Applications (MITA2016), pp. 12–15, 2016.
[11] C. T. Duong, R. Lebret, and K. Aberer, “Multimodal classification for analysing social media,” arXiv preprint arXiv:1708.02099, 2017.
[12] T. Hasan, H. Bořil, A. Sangwan, and J. H. Hansen, “Multi-modal highlight generation for sports videos using an information-theoretic excitability measure,” EURASIP Journal on Advances in Signal Processing, vol. 2013, no. 1, p. 173, 2013.
[13] U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury, “A survey of decision fusion and feature fusion strategies for pattern classification,” IETE Technical review, vol. 27, no. 4, pp. 293–307, 2010.
[14] E. Podrug and A. Subasi, “Surface emg pattern recognition by using dwt feature extraction and svm classifier,” in The 1st Conference of Medical and Biological Engineering in Bosnia and Herzegovina (CMBEBIH 2015), pp. 13–15, 2015.
[15] D. M. Vo and T. H. Le, “Deep generic features and svm for facial expression recognition,” in 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 80–84, IEEE, 2016.
[16] J. Huang, G. Li, Q. Huang, and X. Wu, “Learning label specific features for multi-label classification,” in 2015 IEEE International Conference on Data Mining, pp. 181–190, IEEE, 2015.
[17] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732, 2014.
[18] H. Harb and L. Chen, “Highlights detection in sports videos based on audio analysis,” in Proceedings of the Third International Workshop on Content-Based Multimedia Indexing CBMI03, September, pp. 22–24, 2003.
[19] M.-L. Zhang, Y.-K. Li, X.-Y. Liu, and X. Geng, “Binary relevance for multi-label learning: an overview,” Frontiers of Computer Science, vol. 12, no. 2, pp. 191–202, 2018.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top