Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings 2022
DOI: 10.1145/3528233.3530747
|Get access via publisher |Cite
|
Sign up to set email alerts

CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 68 publications
(16 citation statements)
references
References 28 publications
0
16
0
Order By: Relevance
“…TediGAN [51] proposes to generate an image corresponding to a given text by training an encoder to map the text into the StyleGAN latent space. Several recent works [1,10] jointly utilize pre-trained generative models [4, 12,14] and CLIP to steer the generated result towards the desired target description. More recently, diffusion models (DM) [17,33,41] achieves state-of-the-art results on test-to-image synthesis by decomposing the image formation process into a sequential application of denoising autoencoders.…”
Section: Text-based Image Generationmentioning
confidence: 99%
“…TediGAN [51] proposes to generate an image corresponding to a given text by training an encoder to map the text into the StyleGAN latent space. Several recent works [1,10] jointly utilize pre-trained generative models [4, 12,14] and CLIP to steer the generated result towards the desired target description. More recently, diffusion models (DM) [17,33,41] achieves state-of-the-art results on test-to-image synthesis by decomposing the image formation process into a sequential application of denoising autoencoders.…”
Section: Text-based Image Generationmentioning
confidence: 99%
“…With the introduction of StyleGAN [14][15][16] mapping networks, the input random noise can be first mapped to another latent space that has disentangled semantics, then the model can generate images with better quality. Further, exploring the latent space of StyleGAN has been proved useful by several works [1,17,33] in text-driven image synthe- sis and manipulation tasks, where they utilize the pretrained vision-language model CLIP [37] model to manipulate pretrained unconditional StyleGAN networks. In order to relieve the need for paired text data during the training phase, Lafite [55] proposes to adopt the image CLIP embeddings as training input while using text CLIP embedding during the inference phase.…”
Section: Text-to-image Synthesismentioning
confidence: 99%
“…Such common extensions include retrieval [4], image generation [11], continual learning [61], object detection, few-shot recognition [22], semantic segmentation [47], etc. Others leveraged CLIP's image/text encoders with StyleGAN [45] to enable intuitive text-based semantic image manipulation [1]. In our work we adapt CLIP for zero-shot FG-SBIR in a cross-category setting.…”
Section: Clip In Vision Tasksmentioning
confidence: 99%