2022
DOI: 10.48550/arxiv.2205.11487
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Abstract: We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
312
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 191 publications
(316 citation statements)
references
References 35 publications
2
312
0
Order By: Relevance
“…Conditional image generation is a task to synthesize an intended image provided that conditional information is given. The conditional information could be an image [11], [13], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], a categorical label [10], [27], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], or a text [85], [86], [87], [88], [89], [90]. With GAN's surging popularity, conditional GAN (cGAN) has become a representative framework for conditional image generation, and the cGANs specialized in each conditional information are being studied as separate research topics.…”
Section: Conditional Image Generationmentioning
confidence: 99%
“…Conditional image generation is a task to synthesize an intended image provided that conditional information is given. The conditional information could be an image [11], [13], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], a categorical label [10], [27], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], or a text [85], [86], [87], [88], [89], [90]. With GAN's surging popularity, conditional GAN (cGAN) has become a representative framework for conditional image generation, and the cGANs specialized in each conditional information are being studied as separate research topics.…”
Section: Conditional Image Generationmentioning
confidence: 99%
“…Fine-Grained Stylization with ArtBench Many powerful models emulate text-driven stylization by adding the postfix "... in the style of ..." to a given prompt [22,19,21,23,30]. By using style specific databases obtained from the Art-Bench dataset [16] during inference, we here present an alternative approach.…”
Section: Recap On Retrieval-augmented Diffusion Modelsmentioning
confidence: 99%
“…Diffusion models have recently set the state of the art in image generation and controllable synthesis [9,22]. In text-to-image synthesis in particular, we have seen impressive results [21,23] that can also be used to create artistic images. Such models thus have the potential to help artists create new content and have contributed to the tremendous growth of the field of AI generated art [7].…”
mentioning
confidence: 99%
“…While there have been various standardized image synthesis benchmarks such as CIFAR-10 [26], ImageNet [8], and LSUN [57], they are biased towards photos of real scenes or objects rather than artwork. Recent works like DALLE 2 [39], GLIDE [37], and Imagen [42] datasets have demonstrated the potential of generative models to generate artworks. However, those models are trained on a mixture of photos and artworks of various styles, making it difficult to conduct analysis and evaluations on artwork generation of particular artistic styles.…”
Section: Mark Rothkomentioning
confidence: 99%