2020
DOI: 10.48550/arxiv.2003.04035
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformation-based Adversarial Video Prediction on Large-Scale Data

Abstract: Recent breakthroughs in adversarial generative modeling have led to models capable of producing video samples of high quality, even on large and complex datasets of real-world video. In this work, we focus on the task of video prediction, where given a sequence of frames extracted from a video, the goal is to generate a plausible future sequence. We first improve the state of the art by performing a systematic empirical study of discriminator decompositions and proposing an architecture that yields faster conv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 33 publications
0
15
0
Order By: Relevance
“…Quantitatively, Table 1 3 shows FVD results on BAIR, com-3 SV2P (Babaeizadeh et al, 2017), SAVP (Lee et al, 2018), DVD-GAN-FP (Clark et al, 2019), Video Transformer (Weissenborn et al, 2019, Latent Video Transformer (LVT) (Rakhimov et al, 2020), and TrIVD-GAN (Luc et al, 2020) are our baselines We can see that our method is able to generate realistically looking samples. In addition, we see that VideoGPT is able to sample different trajectories from the same initial frame, showing that it is not simply copying the dataset.…”
Section: Bair Robot Pushingmentioning
confidence: 97%
See 1 more Smart Citation
“…Quantitatively, Table 1 3 shows FVD results on BAIR, com-3 SV2P (Babaeizadeh et al, 2017), SAVP (Lee et al, 2018), DVD-GAN-FP (Clark et al, 2019), Video Transformer (Weissenborn et al, 2019, Latent Video Transformer (LVT) (Rakhimov et al, 2020), and TrIVD-GAN (Luc et al, 2020) are our baselines We can see that our method is able to generate realistically looking samples. In addition, we see that VideoGPT is able to sample different trajectories from the same initial frame, showing that it is not simply copying the dataset.…”
Section: Bair Robot Pushingmentioning
confidence: 97%
“…1. On the widely benchmarked BAIR Robot Pushing dataset (Ebert et al, 2017), VideoGPT can generate realistic samples that are competitive with existing methods such as TrIVD-GAN (Luc et al, 2020), achieving an FVD of 103 when benchmarked with real samples, and an FVD* (Razavi et al, 2019) of 94 when benchmarked with reconstructions.…”
Section: Introductionmentioning
confidence: 99%
“…However, no performance comparison with previous works has been conducted. Improving [127], Luc et al [128] proposed the Transformation-based & TrIple Video Discriminator GAN (TrIVD-GAN-FP) featuring a novel recurrent unit that computes the parameters of a transformation used to warp previous hidden states without any supervision. These Transformation-based Spatial Recurrent Units (TSRUs) are generic modules and can replace any traditional recurrent unit in currently existent video prediction approaches.…”
Section: Kernel-based Resamplingmentioning
confidence: 99%
“…While some work in the area of video generation [7,54,45] has explored video synthesis-generating videos with no prior frame information-many approaches actually focus on the task of video prediction conditioned on past frames [41,47,38,30,23,2,34,58,60,13,27]. It can be argued that video synthesis is a combination of image generation and video prediction.…”
Section: Introductionmentioning
confidence: 99%
“…Approaches toward video prediction have largely skewed toward variations of generative adversarial networks [30,23,7,54,27]. In comparison, we are aware of only a relatively small number of approaches which propose variational autoencoders [2,60,8], autoregressive models [20,57], or flow based approaches [22].…”
Section: Introductionmentioning
confidence: 99%