2019
DOI: 10.48550/arxiv.1902.05395
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Realistic Image Generation using Region-phrase Attention

Abstract: The Generative Adversarial Network (GAN) has recently been applied to generate synthetic images from text. Despite significant advances, most current state-of-the-art algorithms are regular-grid region based; when attention is used, it is mainly applied between individual regular-grid regions and a word. These approaches are sufficient to generate images that contain a single object in its foreground, such as a "bird" or "flower". However, natural languages often involve complex foreground objects and the back… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 13 publications
(22 reference statements)
0
3
0
Order By: Relevance
“…In [17], pedestrians are edited into predefined scenes using pix2pix, with spatial pyramid pooling in the discriminator for direct scrutiny. Pre-trained R-CNN object detection systems have been used to propose regions during GAN training in [8], or as a feature extractor in object-driven GANs [12]. In contrast, we modify the GAN discriminator itself, and leverage automatically derived RoI data.…”
Section: Related Workmentioning
confidence: 99%
“…In [17], pedestrians are edited into predefined scenes using pix2pix, with spatial pyramid pooling in the discriminator for direct scrutiny. Pre-trained R-CNN object detection systems have been used to propose regions during GAN training in [8], or as a feature extractor in object-driven GANs [12]. In contrast, we modify the GAN discriminator itself, and leverage automatically derived RoI data.…”
Section: Related Workmentioning
confidence: 99%
“…The work of Huang et al [129] improved the DAMSM loss by introducing true-grid regions inside every bounding box with word phrases, where attention weights depend on the bounding box and phrase information. So, this mechanism extends the regular gridbased attention that utilizes additional phrase features through parts-of-speech tagging besides sentence and word features.…”
Section: Direct T2imentioning
confidence: 99%
“…Similar to [129], Dynamic Aspect-awarE GAN (DAE-GAN) [136] refers to the importance of aspect in the input text. The model represents text information from multiple granularities of sentence-level, word-level, and aspect-level, for which, besides other attention mechanisms, the aspect-aware dynamic re-drawer (ADR) module is employed.…”
Section: Direct T2imentioning
confidence: 99%