2022
DOI: 10.48550/arxiv.2203.09457
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 35 publications
1
5
0
Order By: Relevance
“…Nevertheless, it achieves clear advantage at longer horizon, demonstrating superior longterm modeling ability. Although VQFormer is also able to generate sharp images, it fails to predicts correct dynamics and object attributes, as also observed in previous works (Yan et al, 2021;Ren & Wang, 2022). This shows that only a strong decoder (i.e.…”
Section: Evaluation On Video Predictionsupporting
confidence: 74%
See 4 more Smart Citations
“…Nevertheless, it achieves clear advantage at longer horizon, demonstrating superior longterm modeling ability. Although VQFormer is also able to generate sharp images, it fails to predicts correct dynamics and object attributes, as also observed in previous works (Yan et al, 2021;Ren & Wang, 2022). This shows that only a strong decoder (i.e.…”
Section: Evaluation On Video Predictionsupporting
confidence: 74%
“…Therefore, it still underperforms RNN-based baselines in the video Transformers for sequential modeling. Inspired by the success of autoregressive Transformers in language modeling (Radford et al, 2018;Brown et al, 2020), they were adapted to video generation tasks (Yan et al, 2021;Ren & Wang, 2022;Micheli et al, 2022;Nash et al, 2022). To handle the high dimensionality of images, these methods often adopt a two-stage training strategy by first mapping images to discrete tokens (Esser et al, 2021), and then learning a Transformer over tokens.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations