2018
DOI: 10.1007/978-3-030-01234-2_44
|View full text |Cite
|
Sign up to set email alerts
|

SDC-Net: Video Prediction Using Spatially-Displaced Convolution

Abstract: Fig. 1. Frame prediction on a YouTube video frame featuring a panning camera. Left to right: MCNet [34] result, and our SDC-Net result. The SDC-Net predicted frame is sharper and preserves fine image details, while color distortion and blurriness is seen in the tree and text in MCNet's predicted frame.Abstract. We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned fu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 98 publications
(58 citation statements)
references
References 36 publications
0
58
0
Order By: Relevance
“…We hope our approach inspires other ways to perform data augmentation, such as GANs [26], to enable cheap dataset collection and achieve improved accuracy in target tasks. For future work, we would like to explore soft label relaxation using the learned kernels in [34] for better uncertainty reasoning. Our state-of-the-art implementation, will be made publicly available to the research community.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We hope our approach inspires other ways to perform data augmentation, such as GANs [26], to enable cheap dataset collection and achieve improved accuracy in target tasks. For future work, we would like to explore soft label relaxation using the learned kernels in [34] for better uncertainty reasoning. Our state-of-the-art implementation, will be made publicly available to the research community.…”
Section: Resultsmentioning
confidence: 99%
“…In our implementation, we use the vector-based architecture as described in [34]. G is a fully convolutional U-net architecture, complete with an encoder and decoder and skip connections between encoder/decoder layers of the same output dimensions.…”
Section: A Implementation Details Of Our Video Prediction/reconstrucmentioning
confidence: 99%
“…On the contrary, motion-based methods [8,9] excel in making sharp predictions, yet fail in occlusion areas where motion predictions are erroneous or ill-defined. Meanwhile, Reda et al [34] propose to model moving appearances with both convolutional kernels as in [10] and vectors as optical flow. Our closest prior work is [11] which also composes the pixel-and flow-based predictions through occlusion maps.…”
Section: High-fidelity Video Predictionmentioning
confidence: 99%
“…[28] untangles the memory of the past from the prediction of the future by learning to predict sampling kernels. [22] combines flow-based and kernel-based approaches to learn a model to predict a motion vector and a kernel simultaneously for each pixel.…”
Section: Related Workmentioning
confidence: 99%