2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00910
|View full text |Cite
|
Sign up to set email alerts
|

Disentangling Propagation and Generation for Video Prediction

Abstract: A dynamic scene has two types of elements: those that move fluidly and can be predicted from previous frames, and those which are disoccluded (exposed) and cannot be extrapolated. Prior approaches to video prediction typically learn either to warp or to hallucinate future pixels, but not both. In this paper, we describe a computational model for high-fidelity video prediction which disentangles motion-specific propagation from motion-agnostic generation. We introduce a confidence-aware warping operator which g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
80
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 86 publications
(81 citation statements)
references
References 42 publications
1
80
0
Order By: Relevance
“…Apart from using a single method, some studies [4], [17], [18] consider combining optical flow information and pixel generation information in the model. Liang et al [4] made use of the idea of dual learning to perform dual learning through video frame prediction tasks and optical flow prediction tasks to alleviate the cumulative error in the frame prediction process.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Apart from using a single method, some studies [4], [17], [18] consider combining optical flow information and pixel generation information in the model. Liang et al [4] made use of the idea of dual learning to perform dual learning through video frame prediction tasks and optical flow prediction tasks to alleviate the cumulative error in the frame prediction process.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, the prediction effect of the model depends on the motion trajectory vector provided by the user, but it is difficult to require the user to provide accurate motion trajectory vector for video prediction. Gao et al [18] proposed a framework decoupling the optical flow prediction model and the pixel generation model. At first, the framework obtains the predicted optical flow map through the optical flow estimation module, then calculates the occlusion map from the pixel density transformation law in the optical flow map, and then uses the occlusion map to occlude the output frame obtained from optical flow estimation, and finally uses the filling module of predicting occlusion contents to predict the occlusion information to obtain the final video frame.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Others split the problem into two problems, motion and content prediction, and learn separate representations for the static and dynamic components. For training, these approaches either use a motion prior, such as optical flow information [9,20,23,[26][27][28], as a conditional input or use learned features to represent pixel dynamics [29].…”
Section: Related Workmentioning
confidence: 99%
“…To make predictions occur more realistic, others tackled the problem by learning separate representations for the static and dynamic components of a video. This is done either by incorporating motion conditions, such as optical flow information [12], [34], [15], [17], [25], or by learning sparse features that represent pixel dynamics [26]. Decomposing the video into static and non-static components allows the network to simply reproduce the values of the static part for the majority of pixels.…”
Section: Related Workmentioning
confidence: 99%