2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00566
|View full text |Cite
|
Sign up to set email alerts
|

Predicting the Future: A Jointly Learnt Model for Action Anticipation

Abstract: Inspired by human neurological structures for action anticipation, we present an action anticipation model that enables the prediction of plausible future actions by forecasting both the visual and temporal future. In contrast to current state-of-the-art methods which first learn a model to predict future video features and then perform action anticipation using these features, the proposed framework jointly learns to perform the two tasks, future visual and temporal representation synthesis, and early action … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 73 publications
(44 citation statements)
references
References 54 publications
(99 reference statements)
0
36
0
Order By: Relevance
“…The average and the median of the gap are 21 seconds and 14 seconds, respectively. Thus, the forecasting gaps in this benchmarks are substantially longer than those used in other action anticipation tasks [19,23,45]. This makes this benchmark particularly challenging as the model is asked to predict the step of segments far away in the future compared to the observed history.…”
Section: F Further Details About Step Forecastingmentioning
confidence: 99%
“…The average and the median of the gap are 21 seconds and 14 seconds, respectively. Thus, the forecasting gaps in this benchmarks are substantially longer than those used in other action anticipation tasks [19,23,45]. This makes this benchmark particularly challenging as the model is asked to predict the step of segments far away in the future compared to the observed history.…”
Section: F Further Details About Step Forecastingmentioning
confidence: 99%
“…Video prediction is an emerging research field of computer vision [49]- [51]. It has been successfully applied in various applications such as action anticipation [52], prediction of object locations [53], trajectory prediction [54], anomaly detection [55] and many more. Given a sequence of previous frames, the target of video prediction is to reason and predict about the subsequent frame(s) based on the analysis of rich spatio-temporal features in a video, e.g., object/background information or regularity of pixel changes [51].…”
Section: Video Frame Predictionmentioning
confidence: 99%
“…Misra et al [40] introduce the idea of learning such visual representations by estimating the order of shuffled video frames. Inspired by the success of this approach, several recent papers focused on designing a novel pretext task using temporal information, such as predicting future frames [13,49,54] or their embeddings [21,27]; estimating the order of frames [10,20,36,40,57] or the direction of video [56]. Another line of research focuses on using temporal coherence [6,24,26,41,62,63] as supervision signal.…”
Section: Related Workmentioning
confidence: 99%