Oron Ashual scite author profile

We introduce a method for the generation of images from an input scene graph. The method separates between a layout embedding and an appearance embedding. The dual embedding leads to generated images that better match the scene graph, have higher visual quality, and support more complex scene graphs. In addition, the embedding scheme supports multiple and diverse output images per scene graph, which can be further controlled by the user. We demonstrate two modes of per-object control: (i) importing elements from other images, and (ii) navigation in the object space, by selecting an appearance archetype.Our code is publicly available at https

show abstract

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Gafni¹,

Polyak²,

Ashual³

et al. 2022

143

View full text Add to dashboard Cite

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Gafni¹,

Polyak²,

Ashual³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

Specifying Object Attributes and Relations in Interactive Scene Generation

Ashual¹,

Wolf²

2019

Preprint

View full text Add to dashboard Cite

KNN-Diffusion: Image Generation via Large-Scale Retrieval

Ashual¹,

Sheynin²,

Polyak³

et al. 2022

Preprint

View full text Add to dashboard Cite

While the availability of massive Text-Image datasets is shown to be extremely useful in training large-scale generative models (e.g. DDPMs, Transformers), their output typically depends on the quality of both the input text, as well as the training dataset. In this work, we show how largescale retrieval methods, in particular efficient K-Nearest-Neighbors (KNN) search, can be used in order to train a model to adapt to new samples. Learning to adapt enables several new capabilities. Sifting through billions of records at inference time is extremely efficient and can alleviate the need to train or memorize an adequately large generative model. Additionally, fine-tuning trained models to new samples can be achieved by simply adding them to the table. Rare concepts, even without any presence in the training set, can be then leveraged during test time without any modification to the generative model. Our diffusion-based model trains on images only, by leveraging a joint Text-Image multi-modal metric. Compared to baseline methods, our generations achieve state of the art results both in human evaluations as well as with perceptual scores when tested on a public multimodal dataset of natural images, as well as on a collected dataset of 400 million Stickers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Oron Ashual

Specifying Object Attributes and Relations in Interactive Scene Generation

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Specifying Object Attributes and Relations in Interactive Scene Generation

KNN-Diffusion: Image Generation via Large-Scale Retrieval

Contact Info

Product

Resources

About