2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00833
|View full text |Cite
|
Sign up to set email alerts
|

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

Abstract: We propose a novel hierarchical approach for text-toimage synthesis by inferring semantic layout. Instead of learning a direct mapping from text to image, our algorithm decomposes the generation process into multiple steps, in which it first constructs a semantic layout from the text by the layout generator and converts the layout to an image by the image generator. The proposed layout generator progressively constructs a semantic layout in a coarse-to-fine manner by generating object bounding boxes and refini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
297
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 333 publications
(297 citation statements)
references
References 25 publications
(56 reference statements)
0
297
0
Order By: Relevance
“…[212] includes extra object pathway to both generator and discriminator to explicit control the object locations. [213] employs a two-stage procedure that first builds a semantic layout automatically from the input sentence with LSTM based box and shape generators, and then synthesizes the image using image generator and discriminators. Since fine-grained word/object level information is not explicitly used for generation, such synthesized images do not contain enough details to make them look realistic.…”
Section: ) Semantic Layout Control For Complex Scenesmentioning
confidence: 99%
“…[212] includes extra object pathway to both generator and discriminator to explicit control the object locations. [213] employs a two-stage procedure that first builds a semantic layout automatically from the input sentence with LSTM based box and shape generators, and then synthesizes the image using image generator and discriminators. Since fine-grained word/object level information is not explicitly used for generation, such synthesized images do not contain enough details to make them look realistic.…”
Section: ) Semantic Layout Control For Complex Scenesmentioning
confidence: 99%
“…Reed et al [24] perform image generation with sentence input along with additional information in the form of keypoints or bounding boxes. Hong et al [11] break down the process of generating an image from a sentence into multiple stages. The input sentence is first used to predict the objects that are present in the scene, followed by prediction of bounding boxes, then semantic segmentation masks, and finally the image.…”
Section: Related Workmentioning
confidence: 99%
“…Despite the rapid progress and recent successes in object generation (e.g., celebrity face, animals, etc.) [1,9,13] and scene generation [4,11,12,19,22,30,31], little attention has been paid to frameworks designed for stochastic semantic layout generation. Having a robust model for layout generation will not only allow us to generate reliable scene layouts, but also provide priors and means to infer latent relationships between objects, advancing progress in the scene understanding domain.…”
Section: Introductionmentioning
confidence: 99%
“…As clip arts in abstract scene can be easily generalized to object bounding boxes in semantic layout, this concept extends to real images [20]. Predicting a semantic layout from text is usually posed as an intermediate step for complex image generation [7] [9]. A complex image refers to the one containing multiple interactive objects.…”
Section: Related Workmentioning
confidence: 99%
“…Figure 1: The Seq-SG2SL framework for inferring semantic layout from scene graph. scene graph [11] [13] for semantic description and semantic layout [9] [26] for image. Therefore, our goal in this work solves the underlying task, inferring semantic layout from scene graph, for connecting text to image.…”
Section: Introductionmentioning
confidence: 99%