跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.171) 您好!臺灣時間:2024/12/07 07:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林東逸
研究生(外文):Dong-Yi Lin
論文名稱:基於自適內容選擇的學習模型應用於棒球影片分類
論文名稱(外文):A Learning Model for Classification of Baseball Videos based on Adaptive Content Selection
指導教授:李明穗
指導教授(外文):Ming-Sui Lee
口試委員:葉家宏李界羲
口試委員(外文):Chia-Hung YehJessy Lee
口試日期:2020-07-22
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:30
中文關鍵詞:棒球帶有注意機制的長短記憶模型動作識別影片分類自適內容選擇
外文關鍵詞:baseballattentive-LSTMactivity recognitionvideo classificationadaptive content selection
DOI:10.6342/NTU202003703
相關次數:
  • 被引用被引用:0
  • 點閱點閱:155
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
棒球是世界上最受歡迎的運動之一,每年都有龐大的商機,相關的科技也蓬勃發展。MLB-YouTube是個更加細分的棒球動作識別資料集,比一般的動作識別資料集都還要更困難一些,因為影片中場景非常類似且每個類別的差異非常微小。在這篇論文,我們微調了一個帶有attention機制的LSTM模型,讓模型更適用於MLB-YouTube資料集,並且引入adaptive content selection,幫助模型更專注在球員及裁判的動作。此外,我們也對資料集做了兩個改進,第一個是原本的資料集在短打及觸身球的影片數量非常少,所以我們從網路上再蒐集了許多這兩個類別的影片,讓資料集更加完整。第二個是我們定義了新的分類方式,改成由許多個動作組合成一個事件,再以事件來做分類,這個新的定義也有助於提升影片分類的準確率。我們提出的方法在原本的分類定義上,提升了6.1%的準確度(mAP)。在新的分類定義上,提升了17.3%的準確度(accuracy)。
Baseball is one of the most popular sports in the world and has huge business opportunities every year. The technologies of baseball are also booming. MLB-YouTube is a fine-grain action recognition dataset, which is more difficult than normal action recognition datasets because the scenes are very similar and the differences in each class are very small. In this thesis, we use and slightly adjust the attentive-LSTM model to make the model more suitable for the MLB-YouTube dataset, and introduce the adaptive content selection to help the model more focus on the actions of the players and the umpire. In addition, we have also made two improvements to the MLB-YouTube dataset. The first is that this dataset has very few videos about bunt and hit-by-pitch so we collecte many videos of these two class from the Internet to make the dataset more complete. The second is that we define new classes by the events in the baseball game. Each event is combined by several activity class, and the model classify videos by event. This new class definition is also helpful. The proposed approach outperforms the state-of-the-art by 6.1% of mAP on original class definition and 17.3% of accuracy on the new class definition.
Abstract i
List ofFigures iv
List ofTables v
1 Introduction 1
2 RelatedWork 3
3 ImproveDataset 7
3.1 MLB-YouTubedataset . ...................... 7
3.2 Expanddataset . .......................... 8
3.3 Definenewclassesbyevents . ................... 9
4 Approach 12
4.1 AttentiveLSTM . .......................... 12
4.2 Adaptivecontentselection . .................... 15
5 Experiment 19
5.1 Implementdetail . ......................... 19
5.2 Results . ............................... 19
5.2.1 Comparebyusingorirginalclassdefinition . ....... 20
5.2.2 Comparebyusingnewclassdefinition . ......... 21
5.3 Ablationstudy . ........................... 22
5.3.1 Adaptivecontentselection . ................ 22
5.3.2 Modificationmodel . .................... 23
5.3.3 ExpandedMLB-YouTubedataset . ............ 24
5.3.4 Newclassdefinition . ................... 24
5.4 Executiontime . .......................... 25
6 Conclusion 26
6.1 Conclusion . ............................ 26
6.2 Futurework . ............................ 26
Reference 28
A. J. Piergiovanni and M. S. Ryoo, “Fine-grained activity recognition in baseball videos,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2018-June, pp. 1821–1830, 2018.
G. Kanojia, S. Kumawat, and S. Raman, “Attentive spatio-temporal representation learning for diving classification,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2019-June, pp.
K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild,” no. November, 2012. [Online]. Available: http://arxiv.org/abs/1212.0402
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb51: A large video database for human motion recognition,” 11 2011, pp. 2556–2563.
W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, “The Kinetics Human Action Video Dataset,” 2017. [Online]. Available: http://arxiv.org/abs/1705.06950
G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta, “Hollywood in homes: Crowdsourcing data collection for activity understanding,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9905 LNCS, pp. 510–526, 2016.
Y. Li, Y. Li, and N. Vasconcelos, “RESOUND: Towards action recognition without representation bias,” Lecture Notes in Computer Science (includ- ing subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11210 LNCS, pp. 520–535, 2018.
H. Wang, A. Kla¨ser, C. Schmid, and C. Liu, “Action recognition by dense trajectories,” in CVPR 2011, 2011, pp. 3169–3176.
H. Wang and C. Schmid, “Action recognition with improved trajectories,” Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558, 2013.
A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Neural Information Processing Systems, vol. 25, 01 2012.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14, 2015.
J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classi- fication,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
A. J. Piergiovanni, C. Fan, and M. S. Ryoo, “Title learning latent subevents in activity videos using temporal attention filters,” in AAAI, 2017.
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 International Conference on Computer Vision, ICCV 2015, pp. 4489–4497, 2015.
J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4724–4733.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top