2022
DOI: 10.48550/arxiv.2208.01626
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Prompt-to-Prompt Image Editing with Cross Attention Control

Abstract: Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly appealing to humans who are used to verbally describe their intent. Therefore, it is only natural to extend the text-driven image synthesis to text-driven image editing. Editing is challenging for these generative models, since an innate property of an editing technique is to p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
126
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 72 publications
(154 citation statements)
references
References 31 publications
1
126
0
Order By: Relevance
“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”
Section: Related Workmentioning
confidence: 99%
“…We operate over the 16 × 16 attention units since they have been shown to contain the most semantic information [20].…”
Section: Sd Generated Imagementioning
confidence: 99%
“…Recently, a line of works [33,28] focus on improving the sampling speed of diffusion model, by either altering the Markovian noising process or embedding the diffusion steps into a learned latent space. Another group [15,3,10] studies the applications of diffusion models such as text-guided image manipulation.…”
Section: Related Workmentioning
confidence: 99%
“…Blended-diffusion [3] uses a user-provided mask and a textual prompt during the diffusion process to blend the target and the existing background iteratively. A concurrent work of ours, prompt-toprompt [10] captures the text cross-attention structure to enable purely prompt-based scene editing without any explicit masks. In our work, we take blended-diffusion as the starting point, and incorporate a domain-specific classifier and its attention structure for mask-free multi-attribute fashion image manipulation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation