Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Richardson, Elad; Alaluf, Yuval; Patashnik, Or; Nitzan, Yotam; Azar, Yaniv; Shapiro, Stav; Cohen–Or, Daniel

doi:10.1109/cvpr46437.2021.00232

Cited by 793 publications

(592 citation statements)

References 48 publications

(51 reference statements)

Supporting

Mentioning

590

Contrasting

Unclassified

Order By: Relevance

“…To manipulate real images, it is first necessary to invert them into their latent code representations. This is typically done via per-image optimization [63,38,8,12,1,2,28,54,48] or by training an encoder to learn a direct mapping from a given image to its corresponding latent code [63,45,62,46,47,55,5,9]. For a comprehensive survey on GAN inversion, we refer the reader to Xia et al [60].…”

Section: Real Image Editingmentioning

confidence: 99%

StyleFusion: A Generative Model for Disentangling Spatial Segments

Kafri¹,

Patashnik²,

Alaluf³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code. Inserting the resulting style code into a pre-trained StyleGAN generator results in a single harmonized image in which each semantic region is controlled by one of the input latent codes. Effectively, StyleFusion yields a disentangled representation of the image, providing fine-grained control over each region of the generated image. Moreover, to help facilitate global control over the generated image, a special input latent code is incorporated into the fused representation. StyleFusion operates in a hierarchical manner, where each level is tasked with learning to disentangle a pair of image regions (e.g., the car body and wheels). The resulting learned disentanglement allows one to modify both local, fine-grained semantics (e.g., facial features) as well as more global features (e.g., pose and background), providing improved flexibility in the synthesis process. As a natural extension, Style-Fusion enables one to perform semantically-aware crossimage mixing of regions that are not necessarily aligned. Finally, we demonstrate how StyleFusion can be paired with existing editing techniques to more faithfully constrain the edit to the user's region of interest. Code is available at: https://github.com/OmerKafri/StyleFusion.

show abstract

Section: Real Image Editingmentioning

confidence: 99%

StyleFusion: A Generative Model for Disentangling Spatial Segments

Kafri¹,

Patashnik²,

Alaluf³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently, many works have been developed for the task of GAN inversion, i.e., reversing a given image back to a latent code with a pre-trained GAN model. Existing methods either optimize the latent code [1] or learn an extra encoder to project the image space back to the latent space [12,38]. Abdal et al [1] embedded images into an extended latent space of StyleGAN, allowing further semantic image editing operations.…”

Section: Related Workmentioning

confidence: 99%

“…These optimization-based methods, however, are slow and improper for real-world applications. To address this issue, Pixel2Style2Pixel (pSp) [38] embeds real images into extended latent space without additional optimization, which can be used in a wide range of imageto-image translation tasks. Menon et al [34] proposed a self-supervised approach that traverses the HR natural image manifold, searching for images that can downscale to the original LR image.…”

Section: Related Workmentioning

confidence: 99%

“…It should be The definition of "Mod" and "Demod" can be found in [22]. noted that the GAN inversion methods [12,34,38] share a similar idea with our GPEN; however, they keep the pretrained GANs unchanged for consistent and convenient face manipulations. While in GPEN, we carefully design and pre-train the GAN blocks and fine-tune the GAN priors for effective BFR.…”

Section: Motivation and Frameworkmentioning

confidence: 99%

“…With the rapid advancement of GAN techniques [21,22], recently some methods have been proposed to reconstruct faces from extremely low resolution inputs [12,34,38]. Richardson et al [38] employed an encoder network to generate a series of style vectors before feeding them into a pre-trained generator, achieving a generic image-to-image translation framework. However, such methods can only work on non-blind image super-resolution problems.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

GAN Prior Embedded Network for Blind Face Restoration in the Wild

Yang¹,

Ren²,

Xie³

et al. 2021

Preprint

View full text Add to dashboard Cite

Blind face restoration (BFR) from severely degraded face images in the wild is a very challenging problem. Due to the high illness of the problem and the complex unknown degradation, directly training a deep neural network (DNN) usually cannot lead to acceptable results. Existing generative adversarial network (GAN) based methods can produce better results but tend to generate over-smoothed restorations. In this work, we propose a new method by first learning a GAN for high-quality face image generation and embedding it into a U-shaped DNN as a prior decoder, then fine-tuning the GAN prior embedded DNN with a set of synthesized low-quality face images. The GAN blocks are designed to ensure that the latent code and noise input to the GAN can be respectively generated from the deep and shallow features of the DNN, controlling the global face structure, local face details and background of the reconstructed image. The proposed GAN prior embedded network (GPEN) is easy-to-implement, and it can generate visually photo-realistic results. Our experiments demonstrated that the proposed GPEN achieves significantly superior results to state-of-the-art BFR methods both quantitatively and qualitatively, especially for the restoration of severely degraded face images in the wild. The source code and models can be found at https://github.com/ yangxy/GPEN .

show abstract

Controlling StyleGANs using rough scribbles via one‐shot learning

Endo

Kanamori

2022

Computer Animation & Virtual

View full text Add to dashboard Cite

This paper tackles the challenging problem of one-shot semantic image synthesis from rough sparse annotations, which we call "semantic scribbles." Namely, from only a single training pair annotated with semantic scribbles, we generate realistic and diverse images with layout control over, for example, facial part layouts and body poses. We present a training strategy that performs pseudo labeling for semantic scribbles using the StyleGAN prior. Our key idea is to construct a simple mapping between StyleGAN features and each semantic class from a single example of semantic scribbles. With such mappings, we can generate an unlimited number of pseudo semantic scribbles from random noise to train an encoder for controlling a pretrained StyleGAN generator. Even with our rough pseudo semantic scribbles obtained via one-shot supervision, our method can synthesize high-quality images thanks to our GAN inversion framework.We further offer optimization-based postprocessing to refine the pixel alignment of synthesized images. Qualitative and quantitative results on various datasets demonstrate improvement over previous approaches in one-shot settings.

show abstract

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Abstract: Figure 1. New "pets" generated using ConceptLab. Each pair depicts a learned concept that was optimized to be unique and distinct from existing members of the pet category. Our method can generate a variety of novel concepts from a single broad category.

Cited by 793 publications

References 48 publications

StyleFusion: A Generative Model for Disentangling Spatial Segments

StyleFusion: A Generative Model for Disentangling Spatial Segments

GAN Prior Embedded Network for Blind Face Restoration in the Wild

Controlling StyleGANs using rough scribbles via one‐shot learning

Contact Info

Product

Resources

About