2021
DOI: 10.1145/3450626.3459838
|View full text |Cite
|
Sign up to set email alerts
|

Designing an encoder for StyleGAN image manipulation

Abstract: Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of Sty… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
123
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 439 publications
(175 citation statements)
references
References 22 publications
0
123
0
Order By: Relevance
“…However, as highlighted by [59], this extended definition achieves higher reconstruction quality in exchange for lower editability. Therefore, [59] carefully design an encoder to maintain editability by mapping to regions of W+ that are close to the original distribution of W. We follow [28] and use the original latent space W. We find that StyleGAN-XL already achieves satisfactory inversion results using basic latent optimization. For inversion on the ImageNet validation set at 512 2 , StyleGAN-XL yields PSNR = 13.5 on average, improving over BigGAN at PSNR = 10.8.…”
Section: Inversion and Manipulationmentioning
confidence: 97%
See 1 more Smart Citation
“…However, as highlighted by [59], this extended definition achieves higher reconstruction quality in exchange for lower editability. Therefore, [59] carefully design an encoder to maintain editability by mapping to regions of W+ that are close to the original distribution of W. We follow [28] and use the original latent space W. We find that StyleGAN-XL already achieves satisfactory inversion results using basic latent optimization. For inversion on the ImageNet validation set at 512 2 , StyleGAN-XL yields PSNR = 13.5 on average, improving over BigGAN at PSNR = 10.8.…”
Section: Inversion and Manipulationmentioning
confidence: 97%
“…Inversion. Standard approaches for inverting G 𝑠 use either latent optimization [1,11,28] or an encoder [3,44,59]. A common way to achieve low reconstruction error is to use an extended definition of the latent space: W+.…”
Section: Inversion and Manipulationmentioning
confidence: 99%
“…8 shows results of image-to-image translation. We invert the input images to GAN latent codes using an off-the-shelf GAN inversion method [Tov et al 2021], and synthesize multi-domain images.…”
Section: Applicationsmentioning
confidence: 99%
“…Recent progress of generative adversarial networks (GANs) has opened up a new chapter in image synthesis [Brock et al 2019;Goodfellow et al 2014;Karras et al 2021Karras et al , 2019Karras et al , 2020b, extending its application to various tasks including data augmentation [Sandfort et al 2019], image restoration [Chan et al 2021;Yang et al 2021], and image and video manipulation [Kang et al 2021;Richardson et al 2021;Shen and Zhou 2021;Tov et al 2021;Tzaban et al 2022;. Unfortunately, this success has been mainly demonstrated on a few large-scale datasets such as human portraits [Karras et al 2019;Liu et al 2015], because of the fundamental requirement of GANs on many training samples.…”
Section: Introductionmentioning
confidence: 99%
“…After obtaining the trained StyleGAN2-ADA generator, we separately trained a pSp encoder network, which could embed a real photograph of soap into the StyleGAN's extended intermediate latent space W +. Mapping the real photo into the layer-wise latent space W + leads to accurate reconstruction quality and expressiveness of the input [95][96][97] . Given a real image, the pSp encoder extracts the 18 latent vectors of W + (w 1 to w 18 ), which are then inserted into the trained StyleGAN2-ADA generator's convolution layers corresponding to their spatial scales in order to reconstruct the input (Figure 1(b)).…”
Section: Unsupervised Learning Framework: Translucent Appearance Gene...mentioning
confidence: 99%