More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Liu, Xihui; Park, Dong Huk; Azadi, Samaneh; Zhang, Gong; Chopikyan, Arman; Hu, Yuxiao; Rohrbach, Anna; Darrell, Trevor

doi:10.1109/wacv56688.2023.00037

Cited by 82 publications

(21 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently powerful controlling mechanisms [46,21,17] emerged to guide the diffusion process for text-to-image generation. Particularly, ControlNet [46] enables to condition the generation process using edges, pose, semantic masks, image depths, etc.…”

Section: Conditional and Specialized Text-to-videomentioning

confidence: 99%

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Khachatryan¹,

Movsisyan²,

Tadevosyan³

et al. 2023

Preprint

View full text Add to dashboard Cite

PAIR) 2 UT Austin 3 U of Oregon 4 UIUC https://github.com/Picsart-AI-Research/Text2Video-Zero Text-to-Video generation: "a horse galloping on a street" Text-to-Video generation: "a panda is playing guitar on times square" Text-to-Video generation + pose control: "a bear dancing on the concrete" Video Instruct-Pix2Pix: "make it Van Gogh Starry Night style" Text-to-Video generation + edge control: "white butterfly" Figure 1: Our method Text2Video-Zero enables zero-shot video generation using (i) a textual prompt (see rows 1, 2), (ii) a prompt combined with guidance from poses or edges (see lower right), and (iii) Video Instruct-Pix2Pix, i.e., instructionguided video editing (see lower left). Results are temporally consistent and follow closely the guidance and textual prompts.

show abstract

Section: Conditional and Specialized Text-to-videomentioning

confidence: 99%

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Khachatryan¹,

Movsisyan²,

Tadevosyan³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…To further explore the extensibility of diffusion models, many works have been devoted to diffusion-based conditional generation, which can be broadly classified into two categories. The first one is the approach known as classifier-guidance (Liu et al, 2023), which utilizes a classifier to promote the sampling process of the pre-trained unconditional model. Despite the low cost, the generation effect is less competitive.…”

Section: Preliminaries and Backgroundmentioning

confidence: 99%

Cones: Concept Neurons in Diffusion Models for Customized Generation

Liu¹,

Feng²,

Zhu³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Both "Paint By Word" [5] and ManiGAN [28] are restricted to specific image domains and are not applicable to open natural images. SDG [31] and DiffusionCLIP [25] are proposed to utilize a diffusion model in order to perform global text-guided image manipulations. GLIDE [36] and DALL•E 2 [39] focus on text-driven open domain image synthesis, as well as local image editing.…”

Section: Text-guided Image Manipulationmentioning

confidence: 99%

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Nisha¹,

Tang²,

Dong³

et al. 2023

Preprint

View full text Add to dashboard Cite

Fig. 1. The results of the proposed region-aware diffusion model (RDM). The texts adhere to the phrase rule "A → B", indicating that RDM transforms entity A into entity B.Image manipulation under the guidance of textual descriptions has recently received a broad range of attention. In this study, we focus on the regional editing of images with the guidance of given text prompts. Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts. To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline by combing latent space diffusion and enhanced directional guidance. In addition, to preserve

show abstract

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Cited by 82 publications

References 35 publications

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Cones: Concept Neurons in Diffusion Models for Customized Generation

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Contact Info

Product

Resources

About