Adversarial nets with perceptual losses for text-to-image synthesis

Cha, Miriam; Gwon, Youngjune; Kung, H. T.

doi:10.1109/mlsp.2017.8168140

Cited by 33 publications

(31 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can compare with it for a general measure of the image diversity. Following the procedure of Prog.GAN, we randomly sample ∼10, 000 image pairs from all generated samples 3 Table 3 right. HDGAN outperforms both methods.…”

Section: Comparative Resultsmentioning

confidence: 99%

“…Recently, Dong et al [8] propose to learn a joint embedding of images and text so as to re-render a prototype image conditioned on a targeting description. Cha et al [3] explore the usage of the perceptional loss [16] with a CNN pretrained on Ima-geNet and Dash et al [6] make use of auxiliary classifiers (similar with [31]) to assist GAN training for text-to-image synthesis. Xu et al [43] shows an attention-driven method to improve fine-grained details.…”

Section: Related Workmentioning

confidence: 99%

“…HDGAN has a very succinct framework, compared most existing methods, as they [43,3] adds extra supervision on output images to 'inject' semantic information, which is shown helpful for improving the inception score. However, it is not clear that whether these strategies can substantially improve the visual quality, which is worth further study.…”

Section: Discriminatorsmentioning

confidence: 99%

See 2 more Smart Citations

Photographic Text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network

Zhang

Xie

Yang

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

273

221

View full text Add to dashboard Cite

This paper presents a novel method to deal with the challenging task of generating photographic images conditioned on semantic image descriptions. Our method introduces accompanying hierarchical-nested adversarial objectives inside the network hierarchies, which regularize mid-level representations and assist generator training to capture the complex image statistics. We present an extensile single-stream generator architecture to better adapt the jointed discriminators and push generated images up to high resolutions. We adopt a multi-purpose adversarial loss to encourage more effective image and text information usage in order to improve the semantic consistency and image fidelity simultaneously. Furthermore, we introduce a new visual-semantic similarity measure to evaluate the semantic consistency of generated images. With extensive experimental validation on three public datasets, our method significantly improves previous state of the arts on all datasets over different evaluation metrics.

show abstract

Section: Comparative Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Discriminatorsmentioning

confidence: 99%

See 1 more Smart Citation

Photographic Text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network

Zhang

Xie

Yang

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

273

221

View full text Add to dashboard Cite

show abstract

“…Finally, we additionally impose a reconstruction loss L rec that encourages the predicted instance masks to be similar to the ground-truths. We implement this idea using perceptual loss [11,3,33,2], which measures the distance of real and fake images in the feature space of a pre-trained CNN by…”

Section: Shape Generationmentioning

confidence: 99%

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

Hong

Yang

Choi

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

342

297

View full text Add to dashboard Cite

We propose a novel hierarchical approach for text-toimage synthesis by inferring semantic layout. Instead of learning a direct mapping from text to image, our algorithm decomposes the generation process into multiple steps, in which it first constructs a semantic layout from the text by the layout generator and converts the layout to an image by the image generator. The proposed layout generator progressively constructs a semantic layout in a coarse-to-fine manner by generating object bounding boxes and refining each box by estimating object shapes inside the box. The image generator synthesizes an image conditioned on the inferred semantic layout, which provides a useful semantic structure of an image matching with the text description. Our model not only generates semantically more meaningful images, but also allows automatic annotation of generated images and user-controlled generation process by modifying the generated scene layout. We demonstrate the capability of the proposed model on challenging MS-COCO dataset and show that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.

show abstract

“…In this work we introduce AR-GAN that differs from the previous approaches by optimizing on an activation reconstruction loss (Johnson et al, 2016;Cha et al 2017) in addition to regularizing the original GAN objective function and cycle-consistency optimizations to present visually more compelling synthetic images on an unaligned dataset. The main focus of this work is to analyze the performance of plant disease recognition systems using synthetically generated image data.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised image translation using adversarial networks for improved plant disease recognition

Nazki

Yoon

Fuentes

et al. 2020

Computers and Electronics in Agriculture

142

View full text Add to dashboard Cite

Acquisition of data in task-specific applications of machine learning like plant disease recognition is a costly endeavor owing to the requirements of professional human diligence and time constraints. In this paper, we present a simple pipeline that uses GANs in an unsupervised image translation environment to improve learning with respect to the data distribution in a plant disease dataset, reducing the partiality introduced by acute class imbalance and hence shifting the classification decision boundary towards better performance. The empirical analysis of our method is demonstrated on a limited dataset of 2789 tomato plant disease images, highly corrupted with an imbalance in the 9 disease categories. First, we extend the state of the art for the GAN-based image-to-image translation method by enhancing the perceptual quality of the generated images and preserving the semantics. We introduce AR-GAN, where in addition to the adversarial loss, our synthetic image generator optimizes on Activation Reconstruction loss (ARL) function that optimizes feature activations against the natural image. We present visually more compelling synthetic images in comparison to most prominent existing models and evaluate the performance of our GAN framework in terms of various datasets and metrics. Second, we evaluate the performance of a baseline convolutional neural network classifier for improved recognition using the resulting synthetic samples to augment our training set and compare it with the classical data augmentation scheme. We observe a significant improvement in classification accuracy (+5.2%) using generated synthetic samples as compared to (+0.8%) increase using classic augmentation in an equal class distribution environment.

show abstract

Adversarial nets with perceptual losses for text-to-image synthesis

Cited by 33 publications

References 9 publications

Photographic Text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network

Photographic Text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

Unsupervised image translation using adversarial networks for improved plant disease recognition

Contact Info

Product

Resources

About