UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance

Li, Wei; Xue, Xiangxin; Xiao, Xinyan; Liu, Jiachen; Yang, Hu; Li, Guohao; Wang, Zhanpeng; Feng, Zhihong; She, Qiaoqiao; Lyu, Yajuan; Wang, Hua

doi:10.48550/arxiv.2210.16031

Cited by 2 publications

(11 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are also other metrics for text-to-image evaluation, including Inception score (IS) [85] for image quality and R-precision for text-to-image generation [9]. [12] 27.10 LAFITE [60] 26.94 DALLE [11] 17.89 GLIDE [15] 12.24 Imagen [16] 7.27 Stable Diffusion [17] 12.63 VQ-Diffussion [43] 13.86 DALL-E 2 [18] 10.39 Upainting [63] 8.34 ERNIE-ViLG 2.0 [64] 6.75 eDiff-I [65] 6.95…”

Section: Technical Evaluation Of Text-to-image Methodsmentioning

confidence: 99%

“…Recent evaluation benchmarks. Apart from the automatic metrics discussed above, multiple works involve human evaluation and propose their new evaluation benchmarks [14], [16], [63], [73], [80], [86], [87]. We summarize representative benchmarks in Table 2.…”

Section: Technical Evaluation Of Text-to-image Methodsmentioning

confidence: 99%

“…Specifically, GLIDE [15] finds that CLIP-guidance underperforms the classifier-free variant of guidance. By contrast, another work UPainting [63] points out that lacking of a large-scale transformer language model makes these models with CLIP guidance difficult to encode text prompts and generate complex scenes with details. By combing large language model and cross-modal matching models, UPainting [63] significantly improves the sample fidelity and image-text alignment of generated images.…”

Section: Improving Model Architecturesmentioning

confidence: 99%

“…By contrast, another work UPainting [63] points out that lacking of a large-scale transformer language model makes these models with CLIP guidance difficult to encode text prompts and generate complex scenes with details. By combing large language model and cross-modal matching models, UPainting [63] significantly improves the sample fidelity and image-text alignment of generated images. The general image synthesis capability enables UPainting [63] to generate images in both simple and complex scenes.…”

Section: Improving Model Architecturesmentioning

confidence: 99%

“…By combing large language model and cross-modal matching models, UPainting [63] significantly improves the sample fidelity and image-text alignment of generated images. The general image synthesis capability enables UPainting [63] to generate images in both simple and complex scenes.…”

Section: Improving Model Architecturesmentioning

confidence: 99%

See 4 more Smart Citations

Generative Adversarial Networks for Diverse and Explainable Text-to-Image Generation

Zhang¹

View full text Add to dashboard Cite

Section: Technical Evaluation Of Text-to-image Methodsmentioning

confidence: 99%

Section: Technical Evaluation Of Text-to-image Methodsmentioning

confidence: 99%

Section: Improving Model Architecturesmentioning

confidence: 99%

Section: Improving Model Architecturesmentioning

confidence: 99%

Section: Improving Model Architecturesmentioning

confidence: 99%

See 3 more Smart Citations

Generative Adversarial Networks for Diverse and Explainable Text-to-Image Generation

Zhang¹

View full text Add to dashboard Cite

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Ahn,

Lee,

Lee

et al. 2024

AAAI

View full text Add to dashboard Cite

Recent progresses in large-scale text-to-image models have yielded remarkable accomplishments, finding various applications in art domain. However, expressing unique characteristics of an artwork (e.g. brushwork, colortone, or composition) with text prompts alone may encounter limitations due to the inherent constraints of verbal description. To this end, we introduce DreamStyle, a novel framework designed for artistic image synthesis, proficient in both text-to-image synthesis and style transfer. DreamStyle optimizes a multi-stage textual embedding with a context-aware text prompt, resulting in prominent image quality. In addition, with content and style guidance, DreamStyle exhibits flexibility to accommodate a range of style references. Experimental results demonstrate its superior performance across multiple scenarios, suggesting its promising potential in artistic product creation. Project page: https://nmhkahn.github.io/dreamstyler/

show abstract

UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance

Cited by 2 publications

References 19 publications

Generative Adversarial Networks for Diverse and Explainable Text-to-Image Generation

Generative Adversarial Networks for Diverse and Explainable Text-to-Image Generation

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models

Contact Info

Product

Resources

About