2022
DOI: 10.1109/tpami.2020.3021209
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Object Accuracy for Generative Text-to-Image Synthesis

Abstract: Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption. To address these challenges we introduce a new model that explicitly models i… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
87
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 93 publications
(87 citation statements)
references
References 36 publications
0
87
0
Order By: Relevance
“…In the field of text-to-image synthesis, Hinz et al [61] introduce Semantic Object Accuracy (SOA) to evaluate images given an image caption. LayoutGAN is proposed by Li et al [62] for graphic design and scene generation, introducing wireframe rendering for image discrimination.…”
Section: B Gui Generationmentioning
confidence: 99%
“…In the field of text-to-image synthesis, Hinz et al [61] introduce Semantic Object Accuracy (SOA) to evaluate images given an image caption. LayoutGAN is proposed by Li et al [62] for graphic design and scene generation, introducing wireframe rendering for image discrimination.…”
Section: B Gui Generationmentioning
confidence: 99%
“…In other words, if the model generates the same image, the FID will be higher (the lower the FID, the better), but IS can not penalize this case. [11,12,13] found that IS is not an appropriate metric to evaluate the text-to-image synthesis models since some models tend to generate the same image when the text contains the same word, which is not good generative models but IS could be high (the higher the IS, the better). Thus, we use FID to evaluate our models.…”
Section: Evaluation Detailsmentioning
confidence: 99%
“…Most existing works [8,9,10,11,12] have achieved remarkable progress by proposing effective structures of GANs. StackGAN [8] uses the stacked structure of multiple GANs to decompose the hard problem of generating highresolution images into tractable subproblems.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations