2021
DOI: 10.48550/arxiv.2103.17249
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Abstract: Stone" "Mohawk hairstyle" "Without makeup" "Cute cat" "Lion" "Gothic church" * Equal contribution, ordered alphabetically. Code and video are available on https://github.com/orpatashnik/StyleCLIP

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
64
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 35 publications
(64 citation statements)
references
References 39 publications
0
64
0
Order By: Relevance
“…Second, it simplifies guidance when conditioning on information that is difficult to predict with a classifier (such as text). Since CLIP provides a score of how close an image is to a caption, several works have used it to steer generative models like GANs towards a user-defined text caption (Galatolo et al, 2021;Patashnik et al, 2021;Murdock, 2021;Gal et al, 2021). To apply the same idea to diffusion models, we can replace the classifier with a CLIP model in classifier guidance.…”
Section: Guided Diffusionmentioning
confidence: 99%
“…Second, it simplifies guidance when conditioning on information that is difficult to predict with a classifier (such as text). Since CLIP provides a score of how close an image is to a caption, several works have used it to steer generative models like GANs towards a user-defined text caption (Galatolo et al, 2021;Patashnik et al, 2021;Murdock, 2021;Gal et al, 2021). To apply the same idea to diffusion models, we can replace the classifier with a CLIP model in classifier guidance.…”
Section: Guided Diffusionmentioning
confidence: 99%
“…Besides producing impressive image samples, generative adversarial networks (GANs) [9] have been shown to learn meaningful latent spaces [18] with extensive studies on multiple derived spaces [15,44] and various knobs and controls for conditional human face generation [12,28,42]. Encoding an image to the GAN's latent space requires an optimization-based inversion process [19,45] or an external image encoder [30], which has limited reconstruction fidelity (or produces latent codes in much higher dimensions outside the learned manifold).…”
Section: Related Workmentioning
confidence: 99%
“…GANs are also frequently used to modify images based on some natural language input (Nam et al, 2018;Li et al, 2020a;Xia et al, 2020). Lastly, CLIP (Radford et al, 2021) can be used in combination with a StyleGAN generator to make semantic edits in images, as exemplified in (Patashnik et al, 2021).…”
Section: Related Workmentioning
confidence: 99%