High-Resolution Image Synthesis with Latent Diffusion Models

Rombach, Robin; Blattmann, Andreas; Lorenz, Dominik; Esser, Patrick; Ommer, Björn

doi:10.1109/cvpr52688.2022.01042

Cited by 3,984 publications

(2,861 citation statements)

References 13 publications

Supporting

Mentioning

2,141

Contrasting

Unclassified

Order By: Relevance

“…Unlike the minimax game in popular GAN models [13,1,20,21], the VQ-based generator is trained by optimizing negative log-likelihood over all examples in the training set, leading to a stable training and bypassing the "mode collapse" issue. Driven by these advantages, many image synthesis models follow the two-stage paradigm, such as image generation [31,45,2,24,16], image-to-image translation [11,10,32], text-to-image synthesis [30,29,10,7], conditional video generation [28,42,44], and image completion [11,10,47]. Apart from VQGAN, the most related works also include ViT-VQGAN [45] and RQ-VAE [24] that aim to train a better quantizer in the first stage.…”

Section: Related Workmentioning

confidence: 99%

“…To the best of our knowledge, this is the first work to modulate quantized vectors and use multichannel quantization on the VQ-based image generation framework. In the following sections, we will describe the modulated quantized vectors and multichannel quantization and discuss their advantages over the concurrent models such as [32,45] and [24] in detail.…”

Section: Related Workmentioning

confidence: 99%

“…While a higher resolution representation, e.g. 32 × 32 × 1 or 64 × 64 × 1, can also improve the reconstruction quality as in [30,32,45], the computational cost will be expensive for the second stage due to the longer sequence s = h × w (see Sec. 3.2 for details).…”

Section: Modulating Quantized Vectormentioning

confidence: 99%

“…On the other hand, compared with VAE [23], VQ-VAE maps an image into a palette of latent discrete codes in higher resolution with spatial structure information and learns the composition of the codes from the data itself, which overcomes the long-dragged image quality issue of VAE in image synthesis. These advantages of VQ-VAE have led to remarkable image synthesis results as evident by its recent extensions, such as VQ-VAE-2 [31], DALL-E [30], VQGAN [11], ImageBART [10], LDMs [32], VIT-VQGAN [45], RQ-VAE [24], MaskGIT [2] and DALL-E-2 [29].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Zheng¹,

Vuong²,

Cai³

et al. 2022

Preprint

View full text Add to dashboard Cite

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Modulating Quantized Vectormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Zheng¹,

Vuong²,

Cai³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To address the intrinsic limitations of GANs for label map generation, we chose to apply state-of-the-art latent diffusion models (LDMs), a generative model that samples noise from a Gaussian distribution and denoises it via a Markov chain process [28] [14]. Coupled with a VAE, LDMs can become efficient and reliable generative models by performing the denoising process in the latent space.…”

Section: Label Generatormentioning

confidence: 99%

Can segmentation models be trained with fully synthetically generated data?

Fernández¹,

Pinaya²,

Borges³

et al. 2022

Preprint

View full text Add to dashboard Cite

In order to achieve good performance and generalisability, medical image segmentation models should be trained on sizeable datasets with sufficient variability. Due to ethics and governance restrictions, and the costs associated with labelling data, scientific development is often stifled, with models trained and tested on limited data. Data augmentation is often used to artificially increase the variability in the data distribution and improve model generalisability. Recent works have explored deep generative models for image synthesis, as such an approach would enable the generation of an effectively infinite amount of varied data, addressing the generalisability and data access problems. However, many proposed solutions limit the user's control over what is generated. In this work, we propose brainSPADE, a model which combines a synthetic diffusion-based label generator with a semantic image generator. Our model can produce fully synthetic brain labels on-demand, with or without pathology of interest, and then generate a corresponding MRI image of an arbitrary guided style. Experiments show that brainSPADE synthetic data can be used to train segmentation models with performance comparable to that of models trained on real data.

show abstract

Geo‐Foundation Models

Mai

2024

International Encyclopedia of Geography

View full text Add to dashboard Cite

As a group of task‐agnostic pretrained large‐scale neural network models that can be later adapted to numerous downstream tasks, foundation models have made a significant impact on academia, industry, and society. Meanwhile, several efforts have been made to develop foundation models for the geoscience domain. They are known as geo‐foundation models (GeoFMs). The necessary steps for GeoFM development were taken in the context of the uniqueness of geographic data and a collaborative effort among academia, industry, and society is necessary to develop a reliable, sustainable, and ethically aware framework.

show abstract

High-Resolution Image Synthesis with Latent Diffusion Models

Cited by 3,984 publications

References 13 publications

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Can segmentation models be trained with fully synthetically generated data?

Geo‐Foundation Models

Contact Info

Product

Resources

About