Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Wang, Su; Saharia, Chitwan; Montgomery, Ceslee; Pont-Tuset, Jordi; Noy, Shai; Pellegrini, Stefano; Onoe, Yasumasa; Laszlo, Sarah; Fleet, David J.; Soricut, Radu; Baldridge, Jason; Norouzi, Mohammad; Anderson, Peter; Chan, William

doi:10.1109/cvpr52729.2023.01761

Cited by 37 publications

(9 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first set comprises the structure loss [Eq. (11)] and the texture loss [Eq. (12)], aiming at guiding the inference process for the structure and texture of the inpainted regions.…”

Section: Loss Functionmentioning

confidence: 99%

See 1 more Smart Citation

Mutual encoder-decoder with bi-gated convolution for image inpainting

Yu,

Yang,

2024

J. Electron. Imag.

View full text Add to dashboard Cite

Partial convolution and gated convolution (GC) have been widely used to solve the limitations of vanilla convolution. However, both approaches have their respective drawbacks. For example, the single-channel binary mask used in partial convolution restricts its flexibility, and the accuracy of the gating values learned from GC cannot be guaranteed. To overcome these limitations, we propose an approach called bigated convolution. It adaptively integrates the binary mask and the gating values learned from the network to obtain refined gating values that effectively characterize the features, thereby increasing the recognition accuracy of the gating value. Furthermore, we propose a feature-adaptive supplementation operation designed specifically for repairing damaged areas within the encoder features. Finally, we present a mutual encoder-decoder architecture with bi-gated convolution. Experiments conducted on two benchmark datasets show that the proposed method has the capability to generate visually plausible results.

show abstract

“…The first set comprises the structure loss [Eq. (11)] and the texture loss [Eq. (12)], aiming at guiding the inference process for the structure and texture of the inpainted regions.…”

Section: Loss Functionmentioning

confidence: 99%

“…Furthermore, based on this technology, people can leverage multimodal information to guide the image restoration process. For example, text descriptions or shape constraints can be used to guide the repair of missing areas within the image, 10,11 resulting in more comprehensive and accurate image content restoration.…”

Section: Introductionmentioning

confidence: 99%

Mutual encoder-decoder with bi-gated convolution for image inpainting

Yu,

Yang,

2024

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…However, these models are limited in filling in the content using out-of-mask context only. A more flexible usage is adding text control [2,3,60,[66][67][68] that allow for text-buided image inpainting. Latent Blended Diffusion [3] proposed blending the generated and original image latents, Imagenator [60] and Diffusion-based Inpainting [45] fine-tune pre-trained text-to-image generation models with masked images as additional input, and SmartBrush [66] fine-tunes an additional mask prediction branch on object-centric datasets.…”

Section: Related Workmentioning

confidence: 99%

Deep learning in fault detection and diagnosis of building HVAC systems: A systematic review with meta analysis

Zhang

Saeed²,

Sadeghian³

2023

Energy and AI

View full text Add to dashboard Cite

“…[149] Ref. Image, Text ✓ ✓ ✓ ✓ SmartBrush [150] Text, Mask ✓ ✓ ✓ IIR-Net [151] Text ✓ ✓ ✓ PowerPaint [152] Text, Mask ✓ ✓ ✓ Imagen Editor [153] Text, Mask ✓ ✓ ✓ ✓ ✓ SmartMask [154] Text ✓ Uni-paint [155] Text, Mask, Ref. [173] Text, Ref.…”

mentioning

confidence: 99%

“…Anydoor [147], FADING [148], PAIR Diffusion [149], SmartBrush [150], IIR-Net [151], PowerPaint [152], Imagen Editor [153], SmartMask [154], Uni-paint [155] Instructional Editing via Full Supervision InstructPix2Pix [156], MoEController [157], FoI [158], LOFIE [159], InstructDiffusion [160], Emu Edit [161], DialogPaint [162], Inst-Inpaint [163], HIVE [164], ImageBrush [165], InstructAny2Pix [166], MGIE [167], SmartEdit [168] Pseudo-Target Retrieval with Weak Supervision iEdit [169], TDIELR [170], ChatFace [171] Fig. 2: Taxonomy of training-based approaches for image editing.…”

mentioning

confidence: 99%

M3VSNET: Unsupervised Multi-Metric Multi-View Stereo Network

Huang

et al. 2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M 3 VSNet, for dense point cloud reconstruction without any supervision. To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function to learn the inherent constraints from different perspectives of matching correspondences. Besides, we also incorporate the normal-depth consistency in the 3D point cloud format to improve the accuracy and continuity of the estimated depth maps. Experimental results show that M 3 VSNet establishes the state-of-the-arts unsupervised method and achieves better performance than previous supervised MVSNet on the DTU dataset and demonstrates the powerful generalization ability on the Tanks & Temples benchmark with effective improvement.

show abstract

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Cited by 37 publications

References 28 publications

Mutual encoder-decoder with bi-gated convolution for image inpainting

Mutual encoder-decoder with bi-gated convolution for image inpainting

Deep learning in fault detection and diagnosis of building HVAC systems: A systematic review with meta analysis

M3VSNET: Unsupervised Multi-Metric Multi-View Stereo Network

Contact Info

Product

Resources

About