研究生:Erick Hendra Putra Alwando
論文名稱(外文):Efficient Multiple Path Search for Action Tube Detection in Videos
指導教授(外文):Wen-Hsien FangYie-Tarng Chen
口試委員(外文):Chien-Ching ChiuKuen-Tsair Lay
中文關鍵詞:action localizationconvolutional neural networks (CNN)multiple path searchlocalization refinementobject detection
This thesis presents an efficient convolutional neural network (CNN)-based approach to detect multiple spatial-temporal action tubes in videos. First, a new fusion strategy is employed, which combines the appearance and the flow information out of the two-stream CNN-based networks along with motion saliency to generate the action detection scores. Thereafter, an efficient multiple path search (MPS) algorithm, is developed to simultaneously
find multiple paths in a single run. In the forward message passing of MPS, each node stores information of a prescribed number of paths based on the accumulated scores determined in the previous stages. A backward path tracing is invoked afterward to find all multiple paths at the same time by fully reusing the information generated in the forward pass without repeating the search process. Thereby, the complexity incurred can be reduced. Moreover, to rectify the potentially inaccurate bounding boxes, a video localization refinement (VLR) scheme is also addressed to further boost the detection accuracy. Simulations show that the proposed MPS provides superior performance compared with the main state-of-the-art works on the widespread UCF-101 and J-HMDB datasets. Together with VLR, the performance of MPS can be further bolstered.
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 CNNs-based Action Classifiers . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Video Localization Refinement . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Fusion Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Multiple Path Search Algorithm . . . . . . . . . . . . . . . . . . . . . 15
4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Evaluation Protocol and Experimental Setup . . . . . . . . . . . . . . 20
4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3.1 The New Fusion Strategy . . . . . . . . . . . . . . . . . . . . 20
4.3.2 Impact of K Parameter . . . . . . . . . . . . . . . . . . . . . 21
4.4 Comparisons with the State-of-the-Art Methods . . . . . . . . . . . . 22
4.5 Computation Time Analysis . . . . . . . . . . . . . . . . . . . . . . . 25
5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Appendix A: Example images from the datasets . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
