Pivotal Tuning for Latent-based Editing of Real Images

Roich, Daniel; Mokady, Ron; Bermano, Amit H.; Cohen–Or, Daniel

doi:10.1145/3544777

Cited by 241 publications

(141 citation statements)

References 41 publications

Supporting

Mentioning

139

Contrasting

Unclassified

Order By: Relevance

“…Editing a real image requires finding an initial noise vector that produces the given input image when fed into the diffusion process. This process, known as inversion, has recently drawn considerable attention for GANs, e.g., [51,1,3,35,50,43,45,47], but has not yet been fully addressed for text-guided diffusion models.…”

Section: Applicationsmentioning

confidence: 99%

Prompt-to-Prompt Image Editing with Cross Attention Control

Hertz¹,

Mokady²,

Tenenbaum³

et al. 2022

Preprint

Self Cite

126

View full text Add to dashboard Cite

Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly appealing to humans who are used to verbally describe their intent. Therefore, it is only natural to extend the text-driven image synthesis to text-driven image editing. Editing is challenging for these generative models, since an innate property of an editing technique is to preserve most of the original image, while in the text-based models, even a small modification of the text prompt often leads to a completely different outcome. State-of-the-art methods mitigate this by requiring the users to provide a spatial mask to localize the edit, hence, ignoring the original structure and content within the masked region. In this paper, we pursue an intuitive prompt-toprompt editing framework, where the edits are controlled by text only. To this end, we analyze a text-conditioned model in depth and observe that the cross-attention layers are the key to controlling the relation between the spatial layout of the image to each word in the prompt. With this observation, we present several applications which monitor the image synthesis by editing the textual prompt only. This includes localized editing by replacing a word, global editing by adding a specification, and even delicately controlling the extent to which a word is reflected in the image. We present our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.* Performed this work while working at Google.

show abstract

Section: Applicationsmentioning

confidence: 99%

Prompt-to-Prompt Image Editing with Cross Attention Control

Hertz¹,

Mokady²,

Tenenbaum³

et al. 2022

Preprint

Self Cite

126

View full text Add to dashboard Cite

show abstract

“…Lin et al [137] (Mar 2022) propose a method for multi-view consistent video editing and animation based on 3D GAN inversion. They invert the video frames into the latent space of a pi-GAN by using pivotal tuning inversion (PTI) [146] and edit face attributes by using StyleFlow [97]. IDE-3D [61] adopts a hybrid GAN inversion approach.…”

Section: Conditional 3d Generative Modelsmentioning

confidence: 99%

A Survey on 3D-aware Image Synthesis

Xia¹,

Xue²

2022

Preprint

View full text Add to dashboard Cite

Recent years have seen remarkable progress in deep learning powered visual content creation. This includes 3D-aware generative image synthesis, which produces high-fidelity images in a 3D-consistent manner while simultaneously capturing compact surfaces of objects from pure image collections without the need for any 3D supervision, thus bridging the gap between 2D imagery and 3D reality. The 3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation. The task of 3D-aware image synthesis has taken the field of computer vision by storm, with hundreds of papers accepted to top-tier journals and conferences in recent year (mainly the past two years), but there lacks a comprehensive survey of this remarkable and swift progress. Our survey aims to introduce new researchers to this topic, provide a useful reference for related works, and stimulate future research directions through our discussion section. Apart from the presented papers, we aim to constantly update the latest relevant papers along with corresponding implementations at https://weihaox.github.io/awesome-3D-aware-synthesis.

show abstract

“…Recent works [55, 19,3,45,56,67,48] have demonstrated semantic manipulation, especially for facial attributes, by analyzing the manifold and finding meaningful direction or mapping. Combining with GAN inversion [1,73,2,52,64,53,4,5], the applications of 2D GANs have been extended to real image editing. Alternatively, there have been studies [11,27,36,25] that discover and disentangle latent embeddings into interpretable dimensions during training of the generator.…”

Section: Related Workmentioning

confidence: 99%

“…despite their capability of multi-view consistency. Recently proposed EG3D [8] has shown experiments of novel view synthesis and presented outstanding results, but it requires iterative optimization for latent code and fine-tuning of the generator [53] for each target image.…”

Section: Related Workmentioning

confidence: 99%

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

Kwak¹,

Li²,

Yoon³

et al. 2022

Preprint

View full text Add to dashboard Cite

Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes. The controllability and interpretability of 3D GANs have not been much explored. In this work, we propose two solutions to overcome these weaknesses of 2D GANs and 3D-aware GANs. We first introduce a novel 3D-aware GAN, SURF-GAN, which is capable of discovering semantic attributes during training and controlling them in an unsupervised manner. After that, we inject the prior of SURF-GAN into StyleGAN to obtain a high-fidelity 3D-controllable generator. Unlike existing latent-based methods allowing implicit pose control, the proposed 3D-controllable StyleGAN enables explicit pose control over portrait generation. This distillation allows direct compatibility between 3D control and many StyleGAN-based techniques (e.g., inversion and stylization), and also brings an advantage in terms of computational resources. Our codes are available at https://github.com/jgkwak95/SURF-GAN.

show abstract

Pivotal Tuning for Latent-based Editing of Real Images

Cited by 241 publications

References 41 publications

Prompt-to-Prompt Image Editing with Cross Attention Control

Prompt-to-Prompt Image Editing with Cross Attention Control

A Survey on 3D-aware Image Synthesis

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

Contact Info

Product

Resources

About