StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Patashnik, Or; Wu, Zongze; Shechtman, Eli; Cohen–Or, Daniel; Lischinski, Dani

doi:10.1109/iccv48922.2021.00209

Cited by 710 publications

(520 citation statements)

References 22 publications

Supporting

Mentioning

439

Contrasting

Unclassified

Order By: Relevance

“…Similarly, [1] uses conditional continuous normalization flows to perform supervised attribute processing in the latent space of StyleGAN2. Recently, text-based manipulation methods have been proposed [15,21], which use CLIP [24] to perform fine-grained and disentangled manipulations of images.…”

Section: Latent Space Manipulationmentioning

confidence: 99%

Discovering Multiple and Diverse Directions for Cognitive Image Properties

Kocasari¹,

Bag²,

Yüksel³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained GANs. These directions enable controllable generation and support a variety of semantic editing operations. While previous work has focused on discovering a single direction that performs a desired editing operation such as zoomin, limited work has been done on the discovery of multiple and diverse directions that can achieve the desired edit. In this work, we propose a novel framework that discovers multiple and diverse directions for a given property of interest. In particular, we focus on the manipulation of cognitive properties such as Memorability, Emo-tional Valence and Aesthetics. We show with extensive experiments that our method successfully manipulates these properties while producing diverse outputs. Our project page and source code can be found at http://catlabteam.github.io/latentcognitive.

show abstract

Section: Latent Space Manipulationmentioning

confidence: 99%

Discovering Multiple and Diverse Directions for Cognitive Image Properties

Kocasari¹,

Bag²,

Yüksel³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Shen et al [51] perform eigenvalue decomposition on the affine transformation layers of StyleGAN2 generators [28] to learn versatile manipulation directions. Xia et al [59] and Patashnik and Wu et al [44] manipulate images using a human-understandable text prompt providing a more intuitive image editing interface.…”

Section: Gan-based Image Editingmentioning

confidence: 99%

“…Instead, one may view our work as complementing these existing approaches. For example, as shall be shown, pairing StyleFusion with GANSpace [17] or StyleCLIP [44] leverages their diverse manipulations while ensuring that the resulting edits alter only the desired semantic regions. [50], Style-CLIP [44]) results in more precise image manipulations.…”

Section: Gan-based Image Editingmentioning

confidence: 99%

“…In Figure 14 we show the advantage of using Style-Fusion's disentangled representation when editing images using three latent traversal editing methods: InterFace-GAN [50], GANSpace [17], and StyleCLIP [44]. For Inter-FaceGAN and GANSpace we use their official implementation and latent directions while for StyleCLIP we adapt the official implementation to train a latent mapper which manipulates only the desired image region.…”

Section: Styleclip Interface Ganspace Ganspacementioning

confidence: 99%

See 1 more Smart Citation

StyleFusion: A Generative Model for Disentangling Spatial Segments

Kafri¹,

Patashnik²,

Alaluf³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, Style-Fusion enables one to perform semantically-aware crossimage mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest. Code is available at: https://github.com/OmerKafri/StyleFusion.

show abstract

“…Our method consists of four steps: First, we find a linear editing direction responsible for a faulty attribute which we wish to enhance. Such directions can be found with weak supervision [32], in a zero-shot manner [24] or even in an un-supervised fashion [16,33]. Second, we build on prior observations that latent space distances are linearly correlated with semantic attribute strengths [22].…”

Section: Introductionmentioning

confidence: 99%

Self-Conditioned Generative Adversarial Networks for Image Editing

Liu¹,

Gal²,

Bermano³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse. The networks focus on the core of the data distribution, leaving the tails -or the edges of the distribution -behind. We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core. Building on this observation, we outline a method for mitigating generative bias through a self-conditioning process, where distances in the latent-space of a pre-trained generator are used to provide initial labels for the data. By fine-tuning the generator on a re-sampled distribution drawn from these selflabeled data, we force the generator to better contend with rare semantic attributes and enable more realistic generation of these properties. We compare our models to a wide range of latent editing methods, and show that by alleviating the bias they achieve finer semantic control and better identity preservation through a wider range of transformations. Our code and models will be available at https: //github.com/yzliu567/sc-gan.

show abstract

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Cited by 710 publications

References 22 publications

Discovering Multiple and Diverse Directions for Cognitive Image Properties

Discovering Multiple and Diverse Directions for Cognitive Image Properties

StyleFusion: A Generative Model for Disentangling Spatial Segments

Self-Conditioned Generative Adversarial Networks for Image Editing

Contact Info

Product

Resources

About