C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis

Joseph, K J; Pal, Arghya; Rajanala, Sailaja; Balasubramanian, Vineeth N

doi:10.1109/wacv.2019.00044

Cited by 24 publications

(14 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since common datasets often contain more than one caption per image, using multiple captions could provide additional information to better describe the whole scene. C4Synth [91] uses multiple captions by employing a crosscaption cycle consistency which ensures that a generated image is consistent with a set of semantically similar sentences. It operates sequentially by iterating over all captions and improves the image quality by distilling concepts from multiple captions [91].…”

Section: Multiple Captionsmentioning

confidence: 99%

“…C4Synth [91] uses multiple captions by employing a crosscaption cycle consistency which ensures that a generated image is consistent with a set of semantically similar sentences. It operates sequentially by iterating over all captions and improves the image quality by distilling concepts from multiple captions [91].…”

Section: Multiple Captionsmentioning

confidence: 99%

“…They enrich an input description by extracting features from multiple captions to guide an attentional image generator. In contrast to [91], RiFeGAN does not need an image captioning network and is executed once instead of multiple times.…”

Section: Multiple Captionsmentioning

confidence: 99%

“…[125] multiple captions [91], [92] multiple captions + mouse traces [126] Table 2: Methods grouped by their supervision. We define "layout" as bounding box and class label annotations, and "masks" as labelled, instance segmentation masks.…”

Section: Evaluation Of T2i Modelsmentioning

confidence: 99%

See 3 more Smart Citations

Adversarial Text-to-Image Synthesis: A Review

Frolov,

Hinz,

Raue

et al. 2021

Preprint

View full text Add to dashboard Cite

With the advent of generative adversarial networks, synthesizing images from textual descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable evaluation metrics that correlate with human judgement. In this review, we contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision. We critically examine current strategies to evaluate textto-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training. This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.

show abstract

Section: Multiple Captionsmentioning

confidence: 99%

Section: Multiple Captionsmentioning

confidence: 99%

Section: Multiple Captionsmentioning

confidence: 99%

Section: Evaluation Of T2i Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Adversarial Text-to-Image Synthesis: A Review

Frolov,

Hinz,

Raue

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Text-guided Synthesis Pioneered by GAN-INT-CLS [36] and GAWWN [37], conditional generative adversarial networks (GANs) [12] have been the dominant framework for text-based image synthesis [19,23,32,42,46,50]. Recent work DALL-E [34] shows promising results with transformers [43] and discrete VAE [35] by leveraging web-scale data.…”

Section: Related Workmentioning

confidence: 99%

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Liu¹,

Park²,

Azadi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from an example image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior methods, and have been successfully demonstrated in unconditional and class-conditional settings. We explore fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both. Guidance is injected into a pretrained unconditional diffusion model using the gradient of image-text or image matching scores. We explore CLIPbased textual guidance as well as both content and stylebased image guidance in a unified form. Our text-guided synthesis approach can be applied to datasets without associated text annotations. We conduct experiments on FFHQ and LSUN datasets, and show results on fine-grained textguided image synthesis, synthesis of images related to a style or content example image, and examples with both textual and image guidance. 1 1 Project page xh-liu.github.io/sdg/

show abstract