2022
DOI: 10.48550/arxiv.2208.01618
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Abstract: Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
144
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 96 publications
(170 citation statements)
references
References 38 publications
0
144
0
Order By: Relevance
“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”
Section: Related Workmentioning
confidence: 99%
“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”
Section: Related Workmentioning
confidence: 99%
“…This unprecedented capability became instantly popular, as users were able to synthesize high-quality images by simply describing the desired result in natural language, as we demonstrate in Figure 9.5. These models have become a centerpiece in an ongoing and quickly advancing research area, as they have been adapted for image editing [147,202], object recontextualization [241,95], 3D object generation [220], and more [119,129,213,346].…”
Section: Regularization By Denoising (Red)mentioning
confidence: 99%
“…Transfer learning aims to adapt models to unseen domains and tasks [47,61,87,102,128,131]. For generative models like GANs and diffusion models, several works have proposed finetuning a pre-trained network to enable image generation in an unseen limited-data domain [33,66,77,81,82,101,124,125,134]. Various works have explored model selection to choose pre-trained models for discriminative tasks [12,27,80,93] or selecting pre-trained discriminators for training GANs [63].…”
Section: Related Workmentioning
confidence: 99%
“…Each model captures a small universe of curated subjects, which can range from realistic rendering of faces and landscapes [56] to photos of historic pottery [5] to cartoon caricatures [50] to single-artist stylistic elements [108]. More recently, various methods enable creative modifications and personalization of existing models, via human-in-the-loop interfaces [8,34,122,123] or fine-tuning of GANs [83,125,135] and text-to-image models [33,101]. Each generative model can represent a substantial investment in a specific idea of the model creator.…”
Section: Introductionmentioning
confidence: 99%