2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01042
|View full text |Cite
|
Sign up to set email alerts
|

High-Resolution Image Synthesis with Latent Diffusion Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
2,141
0
5

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 3,984 publications
(2,861 citation statements)
references
References 13 publications
8
2,141
0
5
Order By: Relevance
“…Unlike the minimax game in popular GAN models [13,1,20,21], the VQ-based generator is trained by optimizing negative log-likelihood over all examples in the training set, leading to a stable training and bypassing the "mode collapse" issue. Driven by these advantages, many image synthesis models follow the two-stage paradigm, such as image generation [31,45,2,24,16], image-to-image translation [11,10,32], text-to-image synthesis [30,29,10,7], conditional video generation [28,42,44], and image completion [11,10,47]. Apart from VQGAN, the most related works also include ViT-VQGAN [45] and RQ-VAE [24] that aim to train a better quantizer in the first stage.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Unlike the minimax game in popular GAN models [13,1,20,21], the VQ-based generator is trained by optimizing negative log-likelihood over all examples in the training set, leading to a stable training and bypassing the "mode collapse" issue. Driven by these advantages, many image synthesis models follow the two-stage paradigm, such as image generation [31,45,2,24,16], image-to-image translation [11,10,32], text-to-image synthesis [30,29,10,7], conditional video generation [28,42,44], and image completion [11,10,47]. Apart from VQGAN, the most related works also include ViT-VQGAN [45] and RQ-VAE [24] that aim to train a better quantizer in the first stage.…”
Section: Related Workmentioning
confidence: 99%
“…To the best of our knowledge, this is the first work to modulate quantized vectors and use multichannel quantization on the VQ-based image generation framework. In the following sections, we will describe the modulated quantized vectors and multichannel quantization and discuss their advantages over the concurrent models such as [32,45] and [24] in detail.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…To address the intrinsic limitations of GANs for label map generation, we chose to apply state-of-the-art latent diffusion models (LDMs), a generative model that samples noise from a Gaussian distribution and denoises it via a Markov chain process [28] [14]. Coupled with a VAE, LDMs can become efficient and reliable generative models by performing the denoising process in the latent space.…”
Section: Label Generatormentioning
confidence: 99%