DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis

Ruan, Shulan; Zhang, Yong; Zhang, Kun; Fan, Yanbo; Tang, Fan; Liu, Qi; Chen, Enhong

doi:10.1109/iccv48922.2021.01370

Cited by 67 publications

(27 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DSE-GAN also outperforms others for R-precision (from 71.08 to 76.31, a 7.36% relative improvement), which indicates the better text-image alignment of our models. Although our DSE-GAN is inferior to some previous works (e.g., DMGAN [3] and DAE-GAN [53]) for IS on MSCOCO, IS score is known for failing to evaluate the generation quality on MSCOCO [4,12,20] and the visual inspection in Fig. 5 indicates DSE-GAN's image quality is much higher than others.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 57%

“…. We qualitatively compare the generated images from our method with three recent state-of-the-art GAN methods, i.e., DMGAN [3], DFGAN [12] and DAE-GAN [53]. For the CUB-200 benchmark, as shown in the first four columns in Fig.…”

Section: Qualitative Resultsmentioning

confidence: 99%

“…To be specific, each stage's generator refines the previous stage's results by increasing the image resolution and adding more details stage by stage. Based on this architecture, some recent works focus on introducing different cross-modal fusion mechanisms between stages to better align text and image [2,3,5,11,12,53], and others model intermediate representations (e.g., object layout or segmentation map) to bridge text and image smoothly [14,29]. However, most existing methods share one commonality: the image features are dynamically generated at each stage, but text features stay static on the contrary.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

Huang

Mao

Wang

et al. 2022

Proceedings of the 30th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Text-to-image generation aims at generating realistic images which are semantically consistent with the given text. Previous works mainly adopt the multi-stage architecture by stacking generatordiscriminator pairs to engage multiple adversarial training, where the text semantics used to provide generation guidance remain static across all stages. This work argues that text features at each stage should be adaptively re-composed conditioned on the status of the historical stage (i.e., historical stage's text and image features) to provide diversified and accurate semantic guidance during the coarse-to-fine generation process. We thereby propose a novel Dynamical Semantic Evolution GAN (DSE-GAN) to re-compose each stage's text features under a novel single adversarial multi-stage architecture. Specifically, we design (1) Dynamic Semantic Evolution (DSE) module, which first aggregates historical image features to summarize the generative feedback, and then dynamically selects words required to be re-composed at each stage as well as re-composed them by dynamically enhancing or suppressing different granularity subspace's semantics. (2) Single Adversarial Multistage Architecture (SAMA), which extends the previous structure by eliminating complicated multiple adversarial training requirements and therefore allows more stages of text-image interactions, and finally facilitates the DSE module. We conduct comprehensive experiments and show that DSE-GAN achieves 7.48% and 37.8% relative FID improvement on two widely used benchmarks, i.e., respectively.

show abstract

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 57%

Section: Qualitative Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

Huang

Mao

Wang

et al. 2022

Proceedings of the 30th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…Similar to [129], Dynamic Aspect-awarE GAN (DAE-GAN) [136] refers to the importance of aspect in the input text. The model represents text information from multiple granularities of sentence-level, word-level, and aspect-level, for which, besides other attention mechanisms, the aspect-aware dynamic re-drawer (ADR) module is employed.…”

Section: Direct T2imentioning

confidence: 99%

“…So, one future direction could be the use of the latent space of these transformer-based models for text embeddings. Another interpretation in text embeddings is the level of attention, ranging from sentences [7] to words [34], and aspects [136], where further extensions such as sensitivity to grammar, positional, and numerical information are neglected. Therefore, the new level of attention to the models is seemingly interesting.…”

Section: Generative Modelsmentioning

confidence: 99%

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

Ullah

Lee

et al. 2022

Sensors

View full text Add to dashboard Cite

For decades, co-relating different data domains to attain the maximum potential of machines has driven research, especially in neural networks. Similarly, text and visual data (images and videos) are two distinct data domains with extensive research in the past. Recently, using natural language to process 2D or 3D images and videos with the immense power of neural nets has witnessed a promising future. Despite the diverse range of remarkable work in this field, notably in the past few years, rapid improvements have also solved future challenges for researchers. Moreover, the connection between these two domains is mainly subjected to GAN, thus limiting the horizons of this field. This review analyzes Text-to-Image (T2I) synthesis as a broader picture, Text-guided Visual-output (T2Vo), with the primary goal being to highlight the gaps by proposing a more comprehensive taxonomy. We broadly categorize text-guided visual output into three main divisions and meaningful subdivisions by critically examining an extensive body of literature from top-tier computer vision venues and closely related fields, such as machine learning and human–computer interaction, aiming at state-of-the-art models with a comparative analysis. This study successively follows previous surveys on T2I, adding value by analogously evaluating the diverse range of existing methods, including different generative models, several types of visual output, critical examination of various approaches, and highlighting the shortcomings, suggesting the future direction of research.

show abstract

A comprehensive review of generative adversarial networks: Fundamentals, applications, and challenges

Mohammed

2023

WIREs Computational Stats

View full text Add to dashboard Cite

In machine learning, a generative model is responsible for generating new samples of data in terms of a probabilistic model. Generative adversarial network (GAN) has been widely used to generate realistic samples in different domains and outperforms its peers in the generative models family. However, producing a robust GAN model is not a trivial task because many challenges face the GAN during the training process and impact its performance, affecting the quality and diversity of the generated samples. In this article, we conduct a comprehensive review of GANs to present the fundamentals of GAN, including its components, types, and objective functions. Also, we present an overview of the evaluation matrices used to evaluate GAN models. Moreover, we list the applications of GANs and research work in various domains. Finally, we present the challenges that face GANs and highlight two significant issues, representing mode collapse and training instability, in addition to those research efforts that tackle these challenges.This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Deep Learning Statistical Learning and Exploratory Methods of the Data Sciences > Neural Networks

show abstract

DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis

Cited by 67 publications

References 26 publications

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

A comprehensive review of generative adversarial networks: Fundamentals, applications, and challenges

Contact Info

Product

Resources

About