2021
DOI: 10.48550/arxiv.2111.15640
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Diffusion Autoencoders: Toward a Meaningful and Decodable Representation

Abstract: Real imageReal image Real imageYounger Older Real imageWavy hair Real imageSmiling Figure 1. Attribute manipulation and interpolation on real input images. Our diffusion autoencoders can encode any image into a meaningful latent code that can be interpolated or modified by a simple linear operation and decoded back to a highly realistic output.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 35 publications
(116 reference statements)
0
6
0
Order By: Relevance
“…The computational benefits of using diffusion to model a latent space has been noted by previous works. Preechakul et al [38] propose an autoencoder framework where diffusion models are used to render latent variables as images, and a second diffusion model is used to generate these latents (similar to our diffusion prior). Vahdat et al [51] use a score-based model for the latent space of a VAE, while Rombach et al [42] use diffusion models on the latents obtained from a VQGAN [14] like autoencoder.…”
Section: Related Workmentioning
confidence: 99%
“…The computational benefits of using diffusion to model a latent space has been noted by previous works. Preechakul et al [38] propose an autoencoder framework where diffusion models are used to render latent variables as images, and a second diffusion model is used to generate these latents (similar to our diffusion prior). Vahdat et al [51] use a score-based model for the latent space of a VAE, while Rombach et al [42] use diffusion models on the latents obtained from a VQGAN [14] like autoencoder.…”
Section: Related Workmentioning
confidence: 99%
“…However, manipulating the latent variables of DMs directly may lead to distorted images or incorrect manipulation [18], as they lack high-level semantic information. Some works [23,37] construct an external semantic latent space to address this issue. Asyrp [20] explores the deepest bottleneck of the UNet as a local semantic latent space (h-space) to accommodate semantic image manipulation.…”
Section: Introductionmentioning
confidence: 99%
“…Some works find that combining DDPMs with VAEs can help accelerate the sampling process of DDPMs. Preechakul et al (2021) condition the reverse process of the DDPM on an encoded vector of an image. Pandey et al (2022) condition the reverse process of the DDPM on an reconstructed image by a VAE.…”
Section: Related Workmentioning
confidence: 99%
“…In this way, we can control high-level semantics for our combined model. Other works (Preechakul et al, 2021;Pandey et al, 2022) have also shown that it is possible to combine DDPMs and VAEs to control high-level semantics of the generated images. Compared with them, our method can achieve better sample quality using the same number of denoising steps as shown in Table 2 and Table 6.…”
Section: A2 Conditional Generationmentioning
confidence: 99%
See 1 more Smart Citation