2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01123
|View full text |Cite
|
Sign up to set email alerts
|

Autoregressive Image Generation using Residual Quantization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
30
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 62 publications
(33 citation statements)
references
References 16 publications
0
30
0
Order By: Relevance
“…Unlike the minimax game in popular GAN models [13,1,20,21], the VQ-based generator is trained by optimizing negative log-likelihood over all examples in the training set, leading to a stable training and bypassing the "mode collapse" issue. Driven by these advantages, many image synthesis models follow the two-stage paradigm, such as image generation [31,45,2,24,16], image-to-image translation [11,10,32], text-to-image synthesis [30,29,10,7], conditional video generation [28,42,44], and image completion [11,10,47]. Apart from VQGAN, the most related works also include ViT-VQGAN [45] and RQ-VAE [24] that aim to train a better quantizer in the first stage.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Unlike the minimax game in popular GAN models [13,1,20,21], the VQ-based generator is trained by optimizing negative log-likelihood over all examples in the training set, leading to a stable training and bypassing the "mode collapse" issue. Driven by these advantages, many image synthesis models follow the two-stage paradigm, such as image generation [31,45,2,24,16], image-to-image translation [11,10,32], text-to-image synthesis [30,29,10,7], conditional video generation [28,42,44], and image completion [11,10,47]. Apart from VQGAN, the most related works also include ViT-VQGAN [45] and RQ-VAE [24] that aim to train a better quantizer in the first stage.…”
Section: Related Workmentioning
confidence: 99%
“…Driven by these advantages, many image synthesis models follow the two-stage paradigm, such as image generation [31,45,2,24,16], image-to-image translation [11,10,32], text-to-image synthesis [30,29,10,7], conditional video generation [28,42,44], and image completion [11,10,47]. Apart from VQGAN, the most related works also include ViT-VQGAN [45] and RQ-VAE [24] that aim to train a better quantizer in the first stage. Compared to them, our model is simple and efficient, yet effective to improve the image quality, without adding the computational cost on higher resolution representations [45] or more stages of recursive quantization [24].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations