Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Nisha, Huang,; Tang, Fan; Dong, Weiming; Lee, Tong‐Yee; Xu, Changsheng

doi:10.48550/arxiv.2302.11797

Cited by 2 publications

(5 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent entity-level editing methods (Huang et al 2023;Hertz et al 2022) have been inspired by exerting control over the latent space or attention maps (Chen, Laina, and Vedaldi 2023). Their constraints on the initial image layout hinder the ability to make substantial structural modifications, not to mention the process of object addition or removal.…”

Section: Intoductionmentioning

confidence: 99%

“…1. To mitigate the interference issue, we employ a pre-trained CLIP segmentation model from RDM (Huang et al 2023), denoted as guidance model Φ, to impose spatial-aware guidance.…”

Section: Multi-region-guided Diffusionmentioning

confidence: 99%

“…The further editing of real-world images holds value in manipulation. When handling multi-object real images, prevailing inversion methods (Mokady et al 2023;Huberman-Spiegelglas, Kulikov, and Michaeli 2023;Gal et al 2022) guided by textual prompts can only reconstruct and edit in the global region. However, the presence of multiple objectives in the prompts can disrupt the initial layout, causing impractical distortions.…”

Section: Multi-region Inversionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-Region Text-Driven Manipulation of Diffusion Imagery

Li,

Zhou,

Sun

et al. 2024

AAAI

View full text Add to dashboard Cite

Text-guided image manipulation has attracted significant attention recently. Prevailing techniques concentrate on image attribute editing for individual objects, however, encountering challenges when it comes to multi-object editing. The main reason is the lack of consistency constraints on the spatial layout. This work presents a multi-region guided image manipulation framework, enabling manipulation through region-level textual prompts. With MultiDiffusion as a baseline, we are dedicated to the automatic generation of a rational multi-object spatial distribution, where disparate regions are fused as a unified entity. To mitigate interference from regional fusion, we employ an off-the-shelf model (CLIP) to impose region-aware spatial guidance on multi-object manipulation. Moreover, when applied to the StableDiffusion, the presence of quality-related yet object-agnostic lengthy words hampers the manipulation. To ensure focus on meaningful object-specific words for efficient guidance and generation, we introduce a keyword selection method. Furthermore, we demonstrate a downstream application of our method for multi-region inversion, which is tailored for manipulating multiple objects in real images. Our approach, compatible with variants of Stable Diffusion models, is readily applicable for manipulating diverse objects in extensive images with high-quality generation, showing superb image control capabilities. Code is available at https://github.com/liyiming09/multi-region-guided-diffusion.

show abstract

Section: Intoductionmentioning

confidence: 99%

“…1. To mitigate the interference issue, we employ a pre-trained CLIP segmentation model from RDM (Huang et al 2023), denoted as guidance model Φ, to impose spatial-aware guidance.…”

Section: Multi-region-guided Diffusionmentioning

confidence: 99%

Section: Multi-region Inversionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Region Text-Driven Manipulation of Diffusion Imagery

Li,

Zhou,

Sun

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…FISEdit [221], Blended Latent Diffusion [222], PFB-Diff [223], DiffEdit [224], RDM [225], MFL [226], Differential Diffusion [227], Watch Your Steps [228], Blended Diffusion [229], ZONE [230], Inpaint Anything [231] Multi-Noise Redirection The Stable Artist [232], SEGA [233], LEDITS [234], OIR-Diffusion [235] Fig. 7: Taxonomy of training and finetuning free approaches for image editing.…”

Section: Training and Finetuning Free Approachesmentioning

confidence: 99%

“…This selective editing approach safeguards unedited regions and preserves their semantic integrity. RDM [225] introduces a region-aware diffusion model that seamlessly integrates masks to automatically pinpoint and edit regions of interest based on text-driven guidance. MFL [226] proposes a two-stage mask-free training paradigm tailored for textguided image editing.…”

Section: Mask Guidancementioning

confidence: 99%

M3VSNET: Unsupervised Multi-Metric Multi-View Stereo Network

Huang

et al. 2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M 3 VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M 3 VSNet establishes the state-of-the-arts unsupervised method and achieves better performance than previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks & Temples benchmark with effective improvement.

show abstract

Region-Aware Diffusion for Zero-shot Text-driven Image Editing

Cited by 2 publications

References 42 publications

Multi-Region Text-Driven Manipulation of Diffusion Imagery

Multi-Region Text-Driven Manipulation of Diffusion Imagery

M3VSNET: Unsupervised Multi-Metric Multi-View Stereo Network

Contact Info

Product

Resources

About