Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Zhao, Shengyu; Cui, Jonathan; Sheng, Yuhong; Dong, Yue; Liang, Xiao; Chang, Eric; Xu, Yan

doi:10.48550/arxiv.2103.10428

Cited by 24 publications

(51 citation statements)

References 70 publications

(85 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ProFill [52] utilizes a contextual attention module and progressively fills the hole with predicted high-confidence pixels. CoModGAN [56] proposes a co-modulation of both conditional and stochastic representations to fill high quality content. Lastly, we compare a two-view SfM based approach [57] -which we refer to as JointDP -by warping the source image with the jointly estimated relative pose and depth using dense correspondence.…”

Section: Methodsmentioning

confidence: 99%

GeoFill: Reference-Based Image Inpainting of Scenes with Complex Geometry

Zhao¹,

Barnes²,

Zhou³

et al. 2022

Preprint

View full text Add to dashboard Cite

Reference-guided image inpainting restores image pixels by leveraging the content from another reference image. The previous state-of-the-art, TransFill, warps the source image with multiple homographies, and fuses them together for hole filling. Inspired by structure from motion pipelines and recent progress in monocular depth estimation, we propose a more principled approach that does not require heuristic planar assumptions. We leverage a monocular depth estimate and predict relative pose between cameras, then align the reference image to the target by a differentiable 3D reprojection and a joint optimization of relative pose and depth map scale and offset. Our approach achieves state-of-the-art performance on both RealEstate10K and MannequinChallenge dataset with large baselines, complex geometry and extreme camera motions. We experimentally verify our approach is also better at handling large holes.

show abstract

Section: Methodsmentioning

confidence: 99%

GeoFill: Reference-Based Image Inpainting of Scenes with Complex Geometry

Zhao¹,

Barnes²,

Zhou³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To overcome this issue, Zheng et al [48] and Zhao et al [46] propose a VAE-based network that trade-offs between diversity and reconstruction. Zhao et al [47], inspired by the StyleGAN2 [15] modulated convolutions, introduces a comodulation layer for the inpainting task in order to improve both diversity and reconstruction. Noteworthy, a new family of auto-regressive methods [27,34,42], which can handle irregular masks, has recently emerged as a powerful alternative for free-form image inpainting.…”

Section: Related Workmentioning

confidence: 99%

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Lugmayr¹,

Danelljan²,

Romero³

et al. 2022

Preprint

View full text Add to dashboard Cite

Figure 1. We use Denoising Diffusion Probabilistic Models (DDPM) for inpainting. The process is conditioned on the masked input (left). It starts from a random Gaussian noise sample that is iteratively denoised until it produces a high-quality output. Since this process is stochastic, we can sample multiple diverse outputs. The DDPM prior forces a harmonized image, is able to reproduce texture from other regions, and inpaint semantically meaningful content.

show abstract

“…However, the enforcement of diversity deteriorates the image quality. Inspired by the recent modulation approach [53] for multimodal image inpainting, we propose a similar network architecture specifically for open-domain image editing. The difference is that our modulation layer does not use the features of the input image, which leads to better fidelity and diversity.…”

Section: Related Workmentioning

confidence: 99%

“…Note that we circumvent direct pixel supervision such as L1 loss [16] for the purpose of encouraging the generation diversity, as suggested in [53]. Some qualitative output results from our trained generator is visualized in Fig.…”

Section: Multimodal Image Editing As Pretrainingmentioning

confidence: 99%

See 1 more Smart Citation

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

Shi¹,

Xu²,

Zheng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, large pretrained models (e.g., BERT, Style-GAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains. Inspired by these efforts, in this paper we propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images while keeping their original content and structure. Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate than the operation space (e.g., contrast, brightness, color curve) used in many existing photo editing softwares. Our model belongs to the image-to-image translation framework which consists of an image encoder and decoder, and is trained on pairs of before-and after-images to produce multimodal outputs. We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks such as language-guided image editing, personalized editing, editing-style clustering, retrieval, etc. We extensively study the unique properties of the editing space in experiments and demonstrate superior performance on the aforementioned tasks.

show abstract

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Cited by 24 publications

References 70 publications

GeoFill: Reference-Based Image Inpainting of Scenes with Complex Geometry

GeoFill: Reference-Based Image Inpainting of Scenes with Complex Geometry

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

Contact Info

Product

Resources

About