跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2024/12/11 02:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王彥茹
研究生(外文):Yan-Ru Wang
論文名稱:去除動作預測中的混雜因素
論文名稱(外文):Deconfounded Action Anticipation
指導教授:徐宏民
指導教授(外文):Winston H. Hsu
口試委員:陳文進葉梅珍陳奕廷余能豪
口試委員(外文):Wen-Chin ChenMei-Chen YehYi-Ting ChenNeng-Hao Yu
口試日期:2021-09-22
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:英文
論文頁數:24
中文關鍵詞:動作預測因果推斷
外文關鍵詞:Action AnticipationCausal Intervention
DOI:10.6342/NTU202103717
相關次數:
  • 被引用被引用:0
  • 點閱點閱:116
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
動作預測是希望模型能夠根據看到的一段影片去推斷即將發生的未來動作, 這對於許多智能應用是很重要的能力,例如:自動駕駛, 輔助型機器人。現階段的作法多利用動作識別的模型所提取出來的資訊來作為設計動作預測模型的基礎。然而,我們發現當我們單純利用動作識別的模型來學習動作預測的問題時,模型會有過於單純仰賴畫面中正在進行的動作來判斷,忽略畫面中其他重要資訊好比畫面中有哪些物件。基於Judea Pearl所提的因果理論,這樣單純依靠被動觀察輸入與輸出的關聯性來推導兩者之間的因果關係是會受到混雜因子的誤導。我們藉由主動干預模型原先的學習模式,讓模型在做出預判之前,必須先考慮每種動作發生的可能性藉此來降低它過於依靠畫面中動作而不去觀察影片中其他資訊的問題。實驗結果顯示,我們所提出的comprehenser有助於消弭上述所提到的問題,並且可應用於不同的動作識別模型架構之上,皆獲得更卓越的性能。
Action anticipation, which predicts future actions based on observed videos, has gained increased attention recently. It is essential for various applications such as autonomous driving and assistive robotics. Most existing works utilize features extracted from a fixed action recognition model to develop their approaches. However, we found that when using an action recognition model to learn anticipation, it tends to predict the future action by merely depending on observed actions and neglect other crucial cues in the video content. In this paper, we regard this problem as "action over-reliance", where the model suffers from over-dependence on current action bias. To prevent the model from resorting to current action bias, we address the action anticipation task from the causality perspective. Based on causal inference, we attribute the "action over-reliance" to the defect in prior frameworks that gives the confounding effect a chance to cause spurious correlations between observed actions and future actions and ends up with poor generalization. To this end, we propose a novel comprehenser module that allows the model to consider effects from each possible action explicitly subject to its prior probability. Experimental results show that our adaptable module manage to alleviate the action over-reliance issue of existing models and boost the performance.
Verification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iii
Abstract iv
Contents vi
List of Figures viii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Related Work 5
2.1 Action Anticipation 5
2.2 Causality in Vision 6
Chapter 3 Approach 7
3.1 Problem Definition 7
3.2 Causal Intervention 8
3.3 Comprehenser Module 10
Chapter 4 Experiments 12
4.1 Dataset 12
4.2 Implementation Details 12
Chapter 5 Results 14
5.1 Quantitative Analysis 14
5.2 Qualitative Analysis 15
Chapter 6 Conclusion 17
References 18
Appendix A — Breakfast Full Results 23
S. Agethen, H.­C. Lee, and W. H. Hsu. Anticipation of human actions with pose­based fine­-grained representations. 2019 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition Workshops(CVPRW)
J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. pages 4724–4733, 07 2017.
A. Chadha, G. Arora, and N. Kaloty. iPerceive: Applying common-­sense reasoning to multi­modal dense video captioning and video question answering. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–13, 2021.
G. Chen, J. Li, J. Lu, and J. Zhou. Human trajectory prediction via counterfactual analysis. In ICCV, 2021.
D. Epstein, B. Chen, and C. Vondrick. Oops! predicting unintentional action in video. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
Y. A. Farha and J. Gall. Uncertainty­-aware anticipation of activities. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 1197–1204, 2019.
Y. A. Farha, A. Richard, and J. Gall. When will you do what? ­anticipating tempo­ral occurrences of activities. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5343–5352, 2018.
C. Feichtenhofer, H. Fan, J. Malik, and K. He. Slowfast networks for video recogni­tion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
A. Furnari, S. Battiato, and G. Maria Farinella. Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, September 2018.
A. Furnari and G. M. Farinella. What would you expect? anticipating egocentric ac­tions with rolling­unrolling lstms and modality attention. In International Conference on Computer Vision (ICCV), 2019.
H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Forecasting future action sequences with neural memory networks. BMVC, 2019.
H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Predicting the future: A jointly learnt model for action anticipation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
J. Gao, Z. Yang, and R. Nevatia. Red: Reinforced encoder-­decoder networks for action anticipation. BMVC, 07 2017.
R. Girdhar and K. Grauman. Anticipative Video Transformer. In ICCV, 2021.
Q. Ke, M. Fritz, and B. Schiele. Time-­conditioned action anticipation in one shot. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
H. Kuehne, A. B. Arslan, and T. Serre. The language of actions: Recovering the
syntax and semantics of goal-­directed human activities. In Proceedings of Computer Vision and Pattern Recognition Conference (CVPR), 2014.
T. Lan, T.­C. Chen, and S. Savarese. A hierarchical representation for future action prediction. In ECCV, pages 689–704, 2014.
C. Li, S. H. Chan, and Y.­T. Chen. Who make drivers stop? towards driver­-centric risk assessment: Risk object identification via causal inference. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10711–10718, 2020.
M. Liu, S. Tang, Y. Li, and J. M. Rehg. Forecasting human­-object interaction: Joint prediction of motor attention and actions in first person video. In ECCV, 2020.
T. Mahmud, M. Hasan, and A. K. Roy­Chowdhury. Joint prediction of activity labels and starting times in untrimmed videos. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5784–5793, 2017.
A. Miech, I. Laptev, J. Sivic, H. Wang, L. Torresani, and D. Tran. Leveraging the present to anticipate the future in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019.
R. Morais, L. Vương, T. Tran, and S. Venkatesh. Learning to abstract and predict human actions. BMVC, 2020.
G. Nan, R. Qiao, Y. Xiao, J. Liu, S. Leng, H. Zhang, and W. Lu. Interventional video grounding with dual contrastive learning. In CVPR, 2021.
L. Neumann, A. Zisserman, and A. Vedaldi. Future event prediction: If and
when. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2935–2943, 2019.
Y. Ng and B. Fernando. Forecasting future action sequences with attention: A new approach to weakly supervised action forecasting. IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, PP, 09 2020.
J. Pearl, M. Glymour, and N. P. Jewell. The book of why: the new science of cause and effect. John Wiley & Sons, 2016.
J. Pearl and D. Mackenzie. The book of why: the new science of cause and effect. 2018. Basic Books.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad­-cam: Visual explanations from deep networks via gradient-­based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
F. Sener, D. Singhania, and A. Yao. Temporal aggregate representations for long­-range video understanding. In European Conference on Computer Vision, pages 154–171. Springer, 2020
F. Sener and A. Yao. Zero­-shot anticipation for instructional activities. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 862–871, 2019.
C. Sun, A. Shrivastava, C. Vondrick, R. Sukthankar, K. Murphy, and C. Schmid. Relational action forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 2019.
D. Surís, R. Liu, and C. Vondrick. Learning the predictability of the future. 2021.
C. Vondrick, H. Pirsiavash, and A. Torralba. Anticipating visual representations from unlabeled video. In 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pages 98-106,Los Alamitos,CA,USA,jun2016. IEEE Com­puter Society.
T. Wang, J. Huang, H. Zhang, and Q. Sun. Visual commonsense r­cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10760–10770, 2020.
Y. Wu, L. Zhu, X. Wang, Y. Yang, and F. Wu. Learning to anticipate egocentric actions by imagination. IEEE Transactions on Image Processing, 30:1143–1152, 01 2021.
K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
X. Yang, F. Feng, W. Ji, M. Wang, and T.­S. Chua. Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, 2021.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top