研究生(外文):Yan-Ru Wang
論文名稱(外文):Deconfounded Action Anticipation
指導教授(外文):Winston H. Hsu
口試委員(外文):Wen-Chin ChenMei-Chen YehYi-Ting ChenNeng-Hao Yu
外文關鍵詞:Action AnticipationCausal Intervention
動作預測是希望模型能夠根據看到的一段影片去推斷即將發生的未來動作, 這對於許多智能應用是很重要的能力,例如:自動駕駛, 輔助型機器人。現階段的作法多利用動作識別的模型所提取出來的資訊來作為設計動作預測模型的基礎。然而,我們發現當我們單純利用動作識別的模型來學習動作預測的問題時,模型會有過於單純仰賴畫面中正在進行的動作來判斷,忽略畫面中其他重要資訊好比畫面中有哪些物件。基於Judea Pearl所提的因果理論,這樣單純依靠被動觀察輸入與輸出的關聯性來推導兩者之間的因果關係是會受到混雜因子的誤導。我們藉由主動干預模型原先的學習模式,讓模型在做出預判之前,必須先考慮每種動作發生的可能性藉此來降低它過於依靠畫面中動作而不去觀察影片中其他資訊的問題。實驗結果顯示,我們所提出的comprehenser有助於消弭上述所提到的問題,並且可應用於不同的動作識別模型架構之上,皆獲得更卓越的性能。
Action anticipation, which predicts future actions based on observed videos, has gained increased attention recently. It is essential for various applications such as autonomous driving and assistive robotics. Most existing works utilize features extracted from a fixed action recognition model to develop their approaches. However, we found that when using an action recognition model to learn anticipation, it tends to predict the future action by merely depending on observed actions and neglect other crucial cues in the video content. In this paper, we regard this problem as "action over-reliance", where the model suffers from over-dependence on current action bias. To prevent the model from resorting to current action bias, we address the action anticipation task from the causality perspective. Based on causal inference, we attribute the "action over-reliance" to the defect in prior frameworks that gives the confounding effect a chance to cause spurious correlations between observed actions and future actions and ends up with poor generalization. To this end, we propose a novel comprehenser module that allows the model to consider effects from each possible action explicitly subject to its prior probability. Experimental results show that our adaptable module manage to alleviate the action over-reliance issue of existing models and boost the performance.
