VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

Crowson, Katherine; Biderman, Stella; Kornis, Daniel; Stander, Dashiell; Hallahan, Eric; Castricato, Louis; Raff, Edward

doi:10.1007/978-3-031-19836-6_6

Cited by 123 publications

(61 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the successful application of transformer-based architectures in neural language processing (NLP), text-to-image systems based on deep generative models have become popular means for computer vision tasks [9,10]. They generate creative images combining concepts, attributes, and styles from expressive text descriptions [11].…”

Section: Text-to-image Systemsmentioning

confidence: 99%

“…This model is used by the first vision of DALL-E [17]. More like a variant, VQ-GAN represents a variety of modalities with discrete latent representations by building a codebook vocabulary with a finite set of learned embeddings and using Transformer instead of the PixelCNN in VQ-VAE [10]. Anyway, the PatchGAN discriminator is used to add anti-loss in the training process.…”

Section: Text-to-image Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Communication in Human–AI Co-Creation: Perceptual Analysis of Paintings Generated by Text-to-Image System

et al. 2022

View full text Add to dashboard Cite

In recent years, art creation using artificial intelligence (AI) has started to become a mainstream phenomenon. One of the latest applications of AI is to generate visual artwork from natural language descriptions where anyone can interact with it to create thousands of artistic images with minimal effort, which provokes the questions: what is the essence of artistic creation, and who can create art in this era? Considering that, in this study, the theoretical communication framework was adopted to investigate the difference in the interaction with the text-to-image system between artists and nonartists. In this experiment, ten artists and ten nonartists were invited to co-create with Midjourney. Their actions and reflections were recorded, and two sets of generated images were collected for the visual question-answering task, with a painting created by the artist as a reference sample. A total of forty-two subjects with artistic backgrounds participated in the evaluated experiment. The results indicated differences between the two groups in their creation actions and their attitude toward AI, while the technology blurred the difference in the perception of the results caused by the creator’s artistic experience. In addition, attention should be paid to communication on the effectiveness level for a better perception of the artistic value.

show abstract

Section: Text-to-image Systemsmentioning

confidence: 99%

Section: Text-to-image Systemsmentioning

confidence: 99%

Communication in Human–AI Co-Creation: Perceptual Analysis of Paintings Generated by Text-to-Image System

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The necessity to generated new biophilic shapes and images as a way of chance-based creativity [15] determined the selection of VQGAN+CLIP code. This code, available in public repository [20], allows generating images of sufficient visual quality from text quotes and keywords. The code without any interface platform was used in order to avoid the aesthetic uniformity of 'platform art' identified by several digital art critics [1], [16].…”

Section: Research Methodology and Resultsmentioning

confidence: 99%

“…She [15] conclude that advanced text-to-image synthesis models, such as DALL-E presented by OpenAI, will represent an important trend in the future of AI art. Recently another text-to-image tool -VQGAN+CLIP (Vector Quantized Generative Adversarial Network and Contrastive Language-Image Pre-training) was introduced in the publication by K. Crowson et al [20]; the authors underline that it is a new methodology, 'which is capable of producing images of high visual quality from text prompts of significant semantic complexity without any training by using a multimodal encoder to guide image generations' producing 'higher visual quality outputs than prior, less flexible approaches like DALL-E' and others. The VQGAN+CLIP code is available in a public repository for art experiments.…”

Section: Literature Review: Ai and Biophilic Designmentioning

confidence: 99%

Shape-finding in Biophilic Architecture: Application of AI-based Tool

Viliūnas

Gražulevičiūtė–Vileniškė

2022

Architecture and Urban Planning

View full text Add to dashboard Cite

The emerging application of AI-based tools in creative practices encourages analysing how these tools could be integrated into ecological architectural design. This research was aimed at identifying the possibilities of applying AI-based tools and approaches for shape-finding in the field of biophilic architectural design. The research encompasses review and analysis of literature, the experiment of shape-finding using AI-based tool VQGAN+CLIP, and the evaluation of generated images according to the system of biophilic design criteria adapted for the purpose of image evaluation. The experiment of shape finding demonstrated that the use of keywords describing the characteristics of natural systems and the VQGAN+CLIP code allow generating unexpected, interesting forms which correspond to some biophilic characteristics. Such forms can be the start of a further creative search for the architect.

show abstract

“…Diffusion models [44,115] transform a user edit (e.g., brushstrokes) to be realistic-looking by adding noise and denoising [76,96]. Generative transformers [30] can be applied to edit images with text-guidance [21] or user-defined masks [18]. Besides image editing, several works enable users to edit and customize a generative model.…”

Section: Related Workmentioning

confidence: 99%