2020
DOI: 10.1002/widm.1345
|View full text |Cite
|
Sign up to set email alerts
|

A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis

Abstract: Text-to-image synthesis refers to computational methods which translate human written textual descriptions, in the form of keywords or sentences, into images with similar semantic meaning to the text. In earlier research, image synthesis relied mainly on word to image correlation analysis combined with supervised methods to find best alignment of the visual content matching to the text. Recent progress in deep learning (DL) has brought a new set of unsupervised deep learning methods, particularly deep generati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 62 publications
(36 citation statements)
references
References 63 publications
0
27
0
Order By: Relevance
“…We can regroup these methods into four categories, as shown in Table 1. Each category has its own characterization [36].  Semantic enhancement: This category represents the GANs methods which focus on generating images semantically related to the input texts, by using the neural network to encode texts as dense features, and fed-it to the second network to generate images matching to the text's description.…”
Section: Methods Categorization Of Gan-based Text-to-image Synthesismentioning
confidence: 99%
“…We can regroup these methods into four categories, as shown in Table 1. Each category has its own characterization [36].  Semantic enhancement: This category represents the GANs methods which focus on generating images semantically related to the input texts, by using the neural network to encode texts as dense features, and fed-it to the second network to generate images matching to the text's description.…”
Section: Methods Categorization Of Gan-based Text-to-image Synthesismentioning
confidence: 99%
“…Since the original GANs frameworks were initially built upon images, there is no doubt that the number of GANs applications in the image domain surpasses other areas such as text, voice, and video. Multiple reviews [30][31][32][33][34] focus on image synthesis, even though the major proportion of the general GANs reviews mentioned above, are also in the image domain. Huang et al [31] categorize the image synthesis GANs frameworks into three types based on the overall architecture.…”
Section: Image Gansmentioning
confidence: 99%
“…Similar applications are also reviewed elsewhere [30], with additional applications such as face ageing and 3D image synthesis. Moreover, Agnese et al [34] direct attention to GANs models that are conditioned on text and produce images. Some of these models fall under semantic enhancement GANs, where the main goal is to ensure that text is semantically coherent with the generated image.…”
Section: Image Gansmentioning
confidence: 99%
“…Researches such as [35] and [36] focus on reviewing recent techniques to incorporate GANs in the problem of text-toimage synthesis. In [35], Agnese et al propose a taxonomy to summarize GAN based text-to-image synthesis papers into four major categories: Semantic Enhancement GANs, Resolution Enhancement GANs, Diversity Enhancement GANS, and Motion Enhancement GANs. Different from the other surveys in this field, Sampath et al [37] examine the most recent developments of GANs techniques for addressing imbalance problems in image data.…”
Section: Gan Applicationsmentioning
confidence: 99%