Imagine This! Scripts to Compositions to Videos

Gupta, Tanmay; Schwenk, Dustin; Farhadi, Ali; Hoiem, Derek; Kembhavi, Aniruddha

doi:10.1007/978-3-030-01237-3_37

Cited by 51 publications

(37 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For instance, story image retrieval from a pre-collected training set rather than image generation [26]. Cartoon generation has been explored with a "cut and paste" technique [11]. However, both of these techniques require large amounts of labeled training data.…”

Section: Related Workmentioning

confidence: 99%

StoryGAN: A Sequential Conditional GAN for Story Visualization

Gan

Shen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

127

133

View full text Add to dashboard Cite

We propose a new task, called Story Visualization. Given a multi-sentence paragraph, the story is visualized by generating a sequence of images, one for each sentence. In contrast to video generation, story visualization focuses less on the continuity in generated images (frames), but more on the global consistency across dynamic scenes and characters -a challenge that has not been addressed by any singleimage or video generation methods. We therefore propose a new story-to-image-sequence generation model, StoryGAN, based on the sequential conditional GAN framework. Our model is unique in that it consists of a deep Context Encoder that dynamically tracks the story flow, and two discriminators at the story and image levels, to enhance the image quality and the consistency of the generated sequences. To evaluate the model, we modified existing datasets to create the CLEVR-SV and Pororo-SV datasets. Empirically, Story-GAN outperforms state-of-the-art models in image quality, contextual consistency metrics, and human evaluation.

show abstract

Section: Related Workmentioning

confidence: 99%

StoryGAN: A Sequential Conditional GAN for Story Visualization

Gan

Shen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

127

133

View full text Add to dashboard Cite

show abstract

“…Kim et al [16] performed pictorial generation from chat logs, while our work uses text which is considerably more underspecified. Gupta et al [9] proposed a semiparametric method to generate cartoon-like pictures. However the presented objects were also provided as inputs to the model, and the predictions of layouts, foregrounds and backgrounds were performed by separably trained modules.…”

Section: Related Workmentioning

confidence: 99%

Text2Scene: Generating Compositional Scenes From Textual Descriptions

Tan

Feng

Ordóñez

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

In this paper, we propose Text2Scene, a model that generates various forms of compositional scene representations from natural language descriptions. Unlike recent works, our method does NOT use Generative Adversarial Networks (GANs). Text2Scene instead learns to sequentially generate objects and their attributes (location, size, appearance, etc) at every time step by attending to different parts of the input text and the current status of the generated scene. We show that under minor modifications, the proposed framework can handle the generation of different forms of scene representations, including cartoon-like scenes, object layouts corresponding to real images, and synthetic images. Our method is not only competitive when compared with state-of-the-art GAN-based methods using automatic metrics and superior based on human judgments but also has the advantage of producing interpretable results.

show abstract

“…While scene layout generation in this work predicts probability distributions for bounding box layout, it fails to model the stochasticity intrinsic in predicting each bounding box. Gupta et al [8] use an approach similar to [11] to predict layouts for generating videos from scripts. Johnson et al [12] uses the scene graph generated from the input sentence as input to the image generation model.…”

Section: Related Workmentioning

confidence: 99%

LayoutVAE: Stochastic Scene Layout Generation From a Label Set

Jyothi

Durand

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Recently there is an increasing interest in scene generation within the research community. However, models used for generating scene layouts from textual description largely ignore plausible visual variations within the structure dictated by the text. We propose LayoutVAE, a variational autoencoder based framework for generating stochastic scene layouts. LayoutVAE is a versatile modeling framework that allows for generating full image layouts given a label set, or per label layouts for an existing image given a new label. In addition, it is also capable of detecting unusual layouts, potentially providing a way to evaluate layout generation problem. Extensive experiments on MNIST-Layouts and challenging COCO 2017 Panoptic dataset verifies the effectiveness of our proposed framework.

show abstract

Imagine This! Scripts to Compositions to Videos

Cited by 51 publications

References 26 publications

StoryGAN: A Sequential Conditional GAN for Story Visualization

StoryGAN: A Sequential Conditional GAN for Story Visualization

Text2Scene: Generating Compositional Scenes From Textual Descriptions

LayoutVAE: Stochastic Scene Layout Generation From a Label Set

Contact Info

Product

Resources

About