Or Patashnik scite author profile

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. These "words" can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks. Our code, data and new words will be available at: https://textual-inversion. github.io → Input samples invert −−−−→ "S * " "An oil painting of S * " "App icon of S * " "Elmo sitting in the same pose as S * " "Crochet S * " → Input samples invert −−−−→ "S * " "Painting of two S * fishing on a boat" "A S * backpack" "Banksy art of S * " "A S * themed lunchbox"Preprint. Under review.

show abstract

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

Alaluf

2021

View full text Add to dashboard Cite

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Richardson¹,

Alaluf²,

Patashnik³

et al. 2020

Preprint

136

View full text Add to dashboard Cite

Designing an encoder for StyleGAN image manipulation

et al. 2021

View full text Add to dashboard Cite

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

show abstract

StyleGAN-NADA

et al. 2022

View full text Add to dashboard Cite

Can a generative model be trained to produce images from a specific domain, guided only by a text prompt, without seeing any image? In other words: can an image generator be trained "blindly"? Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or infeasible to reach with existing methods. We conduct an extensive set of experiments across a wide range of domains. These demonstrate the effectiveness of our approach, and show that our models preserve the latent-space structure that makes generative models appealing for downstream tasks. Code and videos available at: stylegan-nada.github.io/

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Or Patashnik

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Designing an encoder for StyleGAN image manipulation

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Designing an encoder for StyleGAN image manipulation

StyleGAN-NADA

Contact Info

Product

Resources

About