2020
DOI: 10.48550/arxiv.2012.03308
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation

Abstract: In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module is to train an image encoder to map real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity is to learn the text-image matching by mapping the image and text into a common embeddi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(22 citation statements)
references
References 50 publications
(93 reference statements)
0
22
0
Order By: Relevance
“…One reason is that we cannot get the inversion results that are the same as the original images, then in the w ′ optimization process the model loses part of the original semantic information. Another reason is that the latent space W of StyleGAN is still entangled with some semantic attributes [33], which means that when we change part of the attributes, the rest attributes will also be affected. We leave improving the space W with better disentanglement for the future work.…”
Section: Qualitative Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…One reason is that we cannot get the inversion results that are the same as the original images, then in the w ′ optimization process the model loses part of the original semantic information. Another reason is that the latent space W of StyleGAN is still entangled with some semantic attributes [33], which means that when we change part of the attributes, the rest attributes will also be affected. We leave improving the space W with better disentanglement for the future work.…”
Section: Qualitative Resultsmentioning
confidence: 99%
“…This method typically reconstruct the original images by optimizing the latent codes, which is based on either gradient descent [40] or other iterative algorithms [12]. Some works [4,33,40] attempt to combine these two ideas, where the produced latent codes by the GAN inversion encoder are used as the initialization for the optimization step, as it is hard to get the perfect reconstructed images based on the inverted latent codes from a single GAN inversion encoder. Zhu et al [40] propose to train the encoder based on the real images instead of the fake images, as a result, the trained encoder can be better adapted to the real scenarios.…”
Section: Gan Inversionmentioning
confidence: 99%
See 2 more Smart Citations
“…Zheng et al (2020a) encode images as a graph of interacting objects which lets the user modify an image by editing its scene graph. GANs are also frequently used to modify images based on some natural language input (Nam et al, 2018;Li et al, 2020a;Xia et al, 2020). Lastly, CLIP (Radford et al, 2021) can be used in combination with a StyleGAN generator to make semantic edits in images, as exemplified in (Patashnik et al, 2021).…”
Section: Related Workmentioning
confidence: 99%