2019
DOI: 10.1609/aaai.v33i01.33013272
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Learning of Semantic Relevance in Text to Image Synthesis

Abstract: We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching the text input. A key to our training methods is how to form positive and negative training examples with respect … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(33 citation statements)
references
References 17 publications
0
32
0
Order By: Relevance
“…The matching aware discriminator is trained to distinguish between real and matching caption-image pairs ("real"), real but mismatching caption-image pairs ("fake"), and matching captions with generated images ("fake"). [17] modify the sampling procedure during training to obtain a curriculum of mismatching caption-image pairs and introduce an auxiliary classifier that specifically predicts the semantic consistency of a given caption-image pair. [9], [18] use multiple generators and discriminators and are one of the first ones to achieve good image quality at resolutions of 256 × 256 on complex data sets.…”
Section: Related Workmentioning
confidence: 99%
“…The matching aware discriminator is trained to distinguish between real and matching caption-image pairs ("real"), real but mismatching caption-image pairs ("fake"), and matching captions with generated images ("fake"). [17] modify the sampling procedure during training to obtain a curriculum of mismatching caption-image pairs and introduce an auxiliary classifier that specifically predicts the semantic consistency of a given caption-image pair. [9], [18] use multiple generators and discriminators and are one of the first ones to achieve good image quality at resolutions of 256 × 256 on complex data sets.…”
Section: Related Workmentioning
confidence: 99%
“…The final output is discriminator similar to generic GAN; (b) Manifold interpolation matching‐aware discriminator GAN (GAN‐INT‐CLS) (Reed, Akata, Yan, et al, ) feeds text input to both generator and discriminator (texts are preprocessed as embedding features, using function φ (), and concatenated with other input, before feeding to both generator and discriminator). The final output is discriminator similar to generic GAN; (c) Auxiliary classifier GAN (AC‐GAN) (Odena, Olah, & Shlens, ) uses an auxiliary classifier layer to predict the class of the image to ensure that the output consists of images from different classes, resulting in diversified synthesis images; (d) text conditioned auxiliary classifier GAN (TAC‐GAN) (Dash, Gamboa, Ahmed, Afzal, & Liwicki, ) share similar design as GAN‐INT‐CLS, whereas the output include both a discriminator and a classifier (which can be used for classification); and (e) text conditioned semantic classifier GAN (Text‐SeGAN) (Cha, Gown, & Kung, ) uses a regression layer to estimate the semantic relevance between the image, so the generated images are not limited to certain classes and are semantically matching to the text input…”
Section: Preliminaries and Frameworkmentioning
confidence: 99%
“…For example, a recent work (Gao et al, ) proposes to use a pyramid generator and three independent discriminators, each focusing on a different aspect of the images, to lead the generator toward creating images that are photorealistic on multiple levels. Another recent publication (Cha, Gwon, & Kung, ) proposes to use discriminator to measure semantic relevance between image and text instead of class prediction (like most discriminator in GANs does), resulting a new GAN structure outperforming text conditioned auxiliary classifier (TAC‐GAN) (Dash, Gamboa, Ahmed, Afzal, & Liwicki, ) and generating diverse, realistic, and relevant to the input text regardless of class.…”
Section: Preliminaries and Frameworkmentioning
confidence: 99%
“…Nguyen et al [17] introduced the PPGN, which is similar to TAC-GAN and contains a conditional network, to generate images from captions. Furthermore, based on conditional GANs, Cha et al [18] improved the adversarial training process by forming positive-negative label pairs and employing an auxiliary classifier to predict the semantic consistency of a given image-caption pair.…”
Section: A Single-stage Text-to-image Generationmentioning
confidence: 99%