2022
DOI: 10.1007/978-3-031-19836-6_6
|View full text |Cite
|
Sign up to set email alerts
|

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
57
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 123 publications
(61 citation statements)
references
References 24 publications
0
57
0
Order By: Relevance
“…With the successful application of transformer-based architectures in neural language processing (NLP), text-to-image systems based on deep generative models have become popular means for computer vision tasks [9,10]. They generate creative images combining concepts, attributes, and styles from expressive text descriptions [11].…”
Section: Text-to-image Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…With the successful application of transformer-based architectures in neural language processing (NLP), text-to-image systems based on deep generative models have become popular means for computer vision tasks [9,10]. They generate creative images combining concepts, attributes, and styles from expressive text descriptions [11].…”
Section: Text-to-image Systemsmentioning
confidence: 99%
“…This model is used by the first vision of DALL-E [17]. More like a variant, VQ-GAN represents a variety of modalities with discrete latent representations by building a codebook vocabulary with a finite set of learned embeddings and using Transformer instead of the PixelCNN in VQ-VAE [10]. Anyway, the PatchGAN discriminator is used to add anti-loss in the training process.…”
Section: Text-to-image Systemsmentioning
confidence: 99%
“…The necessity to generated new biophilic shapes and images as a way of chance-based creativity [15] determined the selection of VQGAN+CLIP code. This code, available in public repository [20], allows generating images of sufficient visual quality from text quotes and keywords. The code without any interface platform was used in order to avoid the aesthetic uniformity of 'platform art' identified by several digital art critics [1], [16].…”
Section: Research Methodology and Resultsmentioning
confidence: 99%
“…She [15] conclude that advanced text-to-image synthesis models, such as DALL-E presented by OpenAI, will represent an important trend in the future of AI art. Recently another text-to-image tool -VQGAN+CLIP (Vector Quantized Generative Adversarial Network and Contrastive Language-Image Pre-training) was introduced in the publication by K. Crowson et al [20]; the authors underline that it is a new methodology, 'which is capable of producing images of high visual quality from text prompts of significant semantic complexity without any training by using a multimodal encoder to guide image generations' producing 'higher visual quality outputs than prior, less flexible approaches like DALL-E' and others. The VQGAN+CLIP code is available in a public repository for art experiments.…”
Section: Literature Review: Ai and Biophilic Designmentioning
confidence: 99%
“…Diffusion models [44,115] transform a user edit (e.g., brushstrokes) to be realistic-looking by adding noise and denoising [76,96]. Generative transformers [30] can be applied to edit images with text-guidance [21] or user-defined masks [18]. Besides image editing, several works enable users to edit and customize a generative model.…”
Section: Related Workmentioning
confidence: 99%