An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Gal, Ran; Alaluf, Yuval; Atzmon, Yuval; Patashnik, Or; Bermano, Amit H.; Chechik, Gal; Cohen–Or, Daniel

doi:10.48550/arxiv.2208.01618

Cited by 96 publications

(170 citation statements)

References 38 publications

Supporting

Mentioning

144

Contrasting

Order By: Relevance

“…Another related line of work aims to introduce specific concepts to a pre-trained text-to-image model by learning to map a set of images to a "word" in the embedding space of the model [18,25,41]. Several works have also explored providing users with more control over the synthesis process solely through the use of the input text prompt [8,20,24,46].…”

Section: Related Workmentioning

confidence: 99%

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Chefer¹,

Alaluf²,

Vinker³

et al. 2023

Preprint

View full text Add to dashboard Cite

Figure 1. Given a pre-trained text-to-image diffusion model (e.g. Stable Diffusion [39]) our method, Attend-and-Excite, guides the generative model to modify the cross-attention values during the image synthesis process to generate images that more faithfully depict the input text prompt. Stable Diffusion alone (top row) struggles to generate multiple objects (e.g. a horse and a dog). However, by incorporating Attend-and-Excite (bottom row) to strengthen the subject tokens (marked in blue), we achieve images that are more semantically faithful with respect to the input text prompts.

show abstract

Section: Related Workmentioning

confidence: 99%

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

Chefer¹,

Alaluf²,

Vinker³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This unprecedented capability became instantly popular, as users were able to synthesize high-quality images by simply describing the desired result in natural language, as we demonstrate in Figure 9.5. These models have become a centerpiece in an ongoing and quickly advancing research area, as they have been adapted for image editing [147,202], object recontextualization [241,95], 3D object generation [220], and more [119,129,213,346].…”

Section: Regularization By Denoising (Red)mentioning

confidence: 99%

Image Denoising: The Deep Learning Revolution and Beyond -- A Survey Paper --

Elad¹,

Kawar²,

Vaksman³

2023

Preprint

View full text Add to dashboard Cite

Image denoising -removal of additive white Gaussian noise from an image -is one of the oldest and most studied problems in image processing. An extensive work over several decades has led to thousands of papers on this subject, and to many well-performing algorithms for this task. Indeed, ten years ago, these achievements have led some researchers to suspect that "Denoising is Dead", in the sense that all that can be achieved in this domain has already been obtained. However, this turned out to be far from the truth, with the penetration of deep learning (DL) into the realm of image processing. The era of DL brought a revolution to image denoising, both by taking the lead in today's ability for noise suppression in images, and by broadening the scope of denoising problems being treated. Our paper starts by describing this evolution, highlighting in particular the tension and synergy that exist between classical approaches and modern Artificial Intelligence (AI) alternatives in design of image denoisers.The recent transitions in the field of image denoising go far beyond the ability to design better denoisers. In the second part of this paper we focus on recently discovered abilities and prospects of image denoisers. We expose the possibility of using image denoisers for service of other problems, such as regularizing general inverse problems and serving as the prime engine in diffusion-based image synthesis. We also unveil the (strange?) idea that denoising and other inverse problems might not have a unique solution, as common algorithms would have us believe. Instead, we describe constructive ways to produce randomized and diverse high perceptual quality results for inverse problems, all fueled by the progress that DL brought to image denoising. This is a survey paper, and its prime goal is to provide a broad view of the history of the field of image denoising and closely related topics in image processing. Our aim is to give a better context to recent discoveries, and to the influence of the AI revolution in our domain.

show abstract

“…Transfer learning aims to adapt models to unseen domains and tasks [47,61,87,102,128,131]. For generative models like GANs and diffusion models, several works have proposed finetuning a pre-trained network to enable image generation in an unseen limited-data domain [33,66,77,81,82,101,124,125,134]. Various works have explored model selection to choose pre-trained models for discriminative tasks [12,27,80,93] or selecting pre-trained discriminators for training GANs [63].…”

Section: Related Workmentioning

confidence: 99%

“…Each model captures a small universe of curated subjects, which can range from realistic rendering of faces and landscapes [56] to photos of historic pottery [5] to cartoon caricatures [50] to single-artist stylistic elements [108]. More recently, various methods enable creative modifications and personalization of existing models, via human-in-the-loop interfaces [8,34,122,123] or fine-tuning of GANs [83,125,135] and text-to-image models [33,101]. Each generative model can represent a substantial investment in a specific idea of the model creator.…”

Section: Introductionmentioning

confidence: 99%