2021
DOI: 10.48550/arxiv.2110.11586
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Wide and Narrow: Video Prediction from Context and Motion

Abstract: Video prediction, forecasting the future frames from a sequence of input frames, is a challenging task since the view changes are influenced by various factors, such as the global context surrounding the scene and local motion dynamics. In this paper, we propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks. To capture the local motion pattern of objects, we devise local filter memory networks that generate adaptive filter kernels by storing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(7 citation statements)
references
References 44 publications
0
7
0
Order By: Relevance
“…Input Encoding Methods in video prediction typically consist of CNNs with a Unet (Ronneberger et al, 2015) type architecture (e.g., Cho et al, 2021;Ho et al, 2019;Bhattacharjee & Das, 2019;Kwon & Park, 2019;Ying et al, 2019). CNNs are an obvious choice for encoding the sequences of frames within a video due to their effectiveness on image-based data (Russakovsky et al, 2015), and the U-net architecture is an especially versatile form of CNN which can be applied to any application where the 'labels' are of the same format (e.g.…”
Section: Video Predictionmentioning
confidence: 99%
See 4 more Smart Citations
“…Input Encoding Methods in video prediction typically consist of CNNs with a Unet (Ronneberger et al, 2015) type architecture (e.g., Cho et al, 2021;Ho et al, 2019;Bhattacharjee & Das, 2019;Kwon & Park, 2019;Ying et al, 2019). CNNs are an obvious choice for encoding the sequences of frames within a video due to their effectiveness on image-based data (Russakovsky et al, 2015), and the U-net architecture is an especially versatile form of CNN which can be applied to any application where the 'labels' are of the same format (e.g.…”
Section: Video Predictionmentioning
confidence: 99%
“…an image with the same height and width) as the inputs (Ronneberger et al, 2015). Since this is the case for video prediction, U-net CNNs can either be used straightforwardly for encoding purposes (e.g., Cho et al, 2021;Ho et al, 2019) or as the generator within a GAN framework (Isola et al, 2018) to perform adversarial learning (e.g., Bhattacharjee & Das, 2019;Kwon & Park, 2019;Ying et al, 2019).…”
Section: Video Predictionmentioning
confidence: 99%
See 3 more Smart Citations