Pivotal Tuning for Latent-based Editing of Real Images

Roich, Daniel; Mokady, Ron; Bermano, Amit H.; Cohen–Or, Daniel

doi:10.48550/arxiv.2106.05744

Cited by 40 publications

(134 citation statements)

References 35 publications

(76 reference statements)

Supporting

Mentioning

134

Contrasting

Order By: Relevance

“…In addition, there are other two-stage approaches but are not hybrid ones. For example, PIE [44] uses optimization methods on both phases, while PTI [39] combines optimization process with the generator fine-tuning technique. In comparison, our work is also a two-stage method but differs from the above ones in that our approach is only based purely on the encoder-based manner in both phases.…”

Section: Related Workmentioning

confidence: 99%

“…Our goal in this phase is to recover the missing information of input x and thus reduce the discrepancy between x and xw . Other two-stage methods use either per-image optimization process [44] or per-image fine-tuning G [39] to improve reconstruction quality further. However, the drawback of such these approaches is the slow inference time.…”

Section: Phase Ii: Generator Refinement Via Hypernet-mentioning

confidence: 99%

“…PIE [44] opts first to use an optimization process to locate the latent code in W space to preserve the editing ability and then optimize further the latent in the W + space to enhance the reconstruction quality. PTI [39] approaches the same as PIE in the first stage. However, in the second stage, they choose the generator fine-tuning option to improve reconstruction results further.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

Dinh¹,

Tran²,

Nguyen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Real-world image manipulation has achieved fantastic progress in recent years as a result of the exploration and utilization of GAN latent spaces. GAN inversion is the first step in this pipeline, which aims to map the real image to the latent code faithfully. Unfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. In the first phase, we train an encoder to map the input image to StyleGAN2 W-space, which was proven to have excellent editability but lower reconstruction quality. In the second phase, we supplement the reconstruction ability in the initial phase by leveraging a series of hypernetworks to recover the missing information during inversion. These two steps complement each other to yield high reconstruction quality thanks to the hypernetwork branch and excellent editability due to the inversion done in the W-space. Our method is entirely encoder-based, resulting in extremely fast inference. Extensive experiments on two challenging datasets demonstrate the superiority of our method. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Phase Ii: Generator Refinement Via Hypernet-mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

Dinh¹,

Tran²,

Nguyen³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Although traversing the latent space of unconditional GANs can achieve image editing in closed-domain images such as faces, its incapability of generating real-world images (e.g., multiple objects and complex scenes) limits their generalization and application. In addition, since their hidden spaces need to retain all the information of the generated outputs, the inversion [54] of an open-domain image is usually compromised for photo fidelity [1,34]. In contrast, the editing space of our proposed model does not have such limitations.…”

Section: Related Workmentioning

confidence: 99%

“…One related work is styleGAN [19], which is trained to generate realistic images for closed-domain categories such as faces, cats, and cars. Since then, a series of manipulation works [6,34,35,41,44,45] have been built upon styleGAN by inverting a given image to its latent space and then manipulating the latent code to generate a new image while keeping the generator intact.…”

Section: Introductionmentioning

confidence: 99%

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

Shi¹,

Xu²,

Zheng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, large pretrained models (e.g., BERT, Style-GAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains. Inspired by these efforts, in this paper we propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images while keeping their original content and structure. Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate than the operation space (e.g., contrast, brightness, color curve) used in many existing photo editing softwares. Our model belongs to the image-to-image translation framework which consists of an image encoder and decoder, and is trained on pairs of before-and after-images to produce multimodal outputs. We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks such as language-guided image editing, personalized editing, editing-style clustering, retrieval, etc. We extensively study the unique properties of the editing space in experiments and demonstrate superior performance on the aforementioned tasks.

show abstract

Re-Aging GAN++: Temporally Consistent Transformation of Faces in Videos

Makhmudkhujaev,

Hong,

Park

2023

IEEE Access

View full text Add to dashboard Cite

The challenge of transforming the apparent age of human faces in videos has not been adequately addressed due to the complexities involved in preserving spatial and temporal consistency. This task is further complicated by the scarcity of video datasets featuring specific individuals across various age groups. To address these issues, we introduce Re-Aging GAN++ (RAGAN++), a unified framework designed to perform facial age transformation in videos utilizing an innovative GAN-based model trained on still image data. Initially, the modulation process acquires multi-scale personalized age features to depict the attributes of the target age group. Subsequently, the encoder applies Gaussian smoothing at each scale, ensuring a seamless frame-to-frame transition that accounts for inter-frame variations, such as facial motion within the camera's field of view. Remarkably, the proposed model demonstrates the ability to perform facial age transformation in videos despite being trained exclusively on image data. Our proposed method exhibits exceptional spatio-temporal consistency concerning facial identity, expression, and pose while maintaining natural variations across diverse age groups.INDEX TERMS Video generation, age manipulation, GAN, spatio-temporal consistency.

show abstract

Pivotal Tuning for Latent-based Editing of Real Images

Cited by 40 publications

References 35 publications

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

HyperInverter: Improving StyleGAN Inversion via Hypernetwork

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

Re-Aging GAN++: Temporally Consistent Transformation of Faces in Videos

Contact Info

Product

Resources

About