Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Yu, Yingchen; Zhan, Fangneng; Wu, Rongliang; Pan, Jianxiong; Cui, Kaiwen; Lu, Shijian; Ma, Feiying; Xie, Xuansong; Chen, Miao

doi:10.1145/3474085.3475436

Cited by 88 publications

(61 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Image Generation Loss Image generation tasks entail various losses to achieve dedicated purposes in image synthesis [23,24,32,39,40,43,44,[47][48][49]. For instance, unpaired image translation is usually associated with certain losses to encourage correlation between the input and output images.…”

Section: Related Workmentioning

confidence: 99%

Modulated Contrast for Versatile Image Synthesis

Zhan¹,

Zhang²,

Yu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Perceiving the similarity between images has been a long-standing and fundamental problem underlying various visual generation tasks. Predominant approaches measure the inter-image distance by computing pointwise absolute deviations, which tends to estimate the median of instance distributions and leads to blurs and artifacts in the generated images. This paper presents MoNCE, a versatile metric that introduces image contrast to learn a calibrated metric for the perception of multifaceted inter-image distances. Unlike vanilla contrast which indiscriminately pushes negative samples from the anchor regardless of their similarity, we propose to re-weight the pushing force of negative samples adaptively according to their similarity to the anchor, which facilitates the contrastive learning from informative negative samples. Since multiple patch-level contrastive objectives are involved in image distance measurement, we introduce optimal transport in MoNCE to modulate the pushing force of negative samples collaboratively across multiple contrastive objectives. Extensive experiments over multiple image translation tasks show that the proposed MoNCE outperforms various prevailing metrics substantially. The code is available at MoNCE.

show abstract

Section: Related Workmentioning

confidence: 99%

Modulated Contrast for Versatile Image Synthesis

Zhan¹,

Zhang²,

Yu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, Wan et al [33] propose the first transformer based image inpainting method to get the image prior and send the image prior to a CNN. To incorporate the image prior, the approach of [37] designs a bidirectional and autoregressive transformer. More recently, Input Ours LaMa GT GT Spectra Ours LaMa Spectra Figure 3.…”

Section: Related Work 21 Image Inpaintingmentioning

confidence: 99%

“…Recent years, most state-of-the-art approaches are mainly based on convolutional neural networks or transformer. In the approaches of [22,35,38,40], they apply the convolutional neural networks for image inpainting, while other line of research [33,37] leverages the transformer in image inpainting at the low-resolution image space, and then introduces the GAN based networks for high quality image generation. Suvorov et al [31] utilize the Fast Fourier Convolution (FFC) instead of regular convolution to obtain features of global receptive fields in frequency domain.…”

Section: Introductionmentioning

confidence: 99%

GLaMa: Joint Spatial and Frequency Loss for General Image Inpainting

Lu¹,

Jiang²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

The purpose of image inpainting is to recover scratches and damaged areas using context information from remaining parts. In recent years, thanks to the resurgence of convolutional neural networks (CNNs), image inpainting task has made great breakthroughs. However, most of the work consider insufficient types of mask, and their performance will drop dramatically when encountering unseen masks. To combat these challenges, we propose a simple yet general method to solve this problem based on the LaMa image inpainting framework [35], dubbed GLaMa. Our proposed GLaMa can better capture different types of missing information by using more types of masks. By incorporating more degraded images in the training phase, we can expect to enhance the robustness of the model with respect to various masks. In order to yield more reasonable results, we further introduce a frequency-based loss in addition to the traditional spatial reconstruction loss and adversarial loss. In particular, we introduce an effective reconstruction loss both in the spatial and frequency domain to reduce the chessboard effect and ripples in the reconstructed image. Extensive experiments demonstrate that our method can boost the performance over the original LaMa method for each type of mask on FFHQ [18], ImageNet [7], Places2 [42] and WikiArt [28] dataset. The proposed GLaMa was ranked first in terms of PSNR, LPIPS [39] and SSIM [34] in the NTIRE 2022 Image Inpainting Challenge Track 1 Unsupervised [27].

show abstract

“…Nevertheless, the above approaches expose a common drawback in recovering the image global structure. Therefore, many studies improve the network to better recover the global structure by introducing relevant structural priors [4,23,25,31,43]. However, these low-level structural priors are difficult to obtain, under the large corrupted regions.…”

Section: Introductionmentioning

confidence: 99%

“…• We propose InCo 2 Loss, a pair of similarity based losses to further improve the inter-coordination between the corrupted and non-corrupted regions and the intra-coordination in corrupted regions. [43] employ autoregressive transformers to inpaint diverse faces. However, these methods generally ignore the modeling of the facial internal correlations, and limit the refinement of the specific facial semantic regions.…”

Section: Introductionmentioning

confidence: 99%

ShowFace: Coordinated Face Inpainting with Memory-Disentangled Refinement Networks

Wu¹,

Qi²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Face inpainting aims to complete the corrupted regions of the face images, which requires coordination between the completed areas and the non-corrupted areas. Recently, memory-oriented methods illustrate great prospects in the generation related tasks by introducing an external memory module to improve image coordination. However, such methods still have limitations in restoring the consistency and continuity for specific facial semantic parts. In this paper, we propose the coarse-to-fine Memory-Disentangled Refinement Networks (MDRNets) for coordinated face inpainting, in which two collaborative modules are integrated, Disentangled Memory Module (DMM) and Mask-Region Enhanced Module (MREM). Specifically, the DMM establishes a group of disentangled memory blocks to store the semantic-decoupled face representations, which could provide the most relevant information to refine the semantic-level coordination. The MREM involves a masked correlation mining mechanism to enhance the feature relationships into the corrupted regions, which could also make up for the correlation loss caused by memory disentanglement. Furthermore, to better improve the inter-coordination between the corrupted and non-corrupted regions and enhance the intra-coordination in corrupted regions, we design InCo 2 Loss, a pair of similarity based losses to constrain the feature consistency. Eventually, extensive experiments conducted on CelebA-HQ and FFHQ datasets demonstrate the superiority of our MDRNets compared with previous State-Of-The-Art methods. CCS CONCEPTS• Computing methodologies → Computer vision.

show abstract

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Cited by 88 publications

References 30 publications

Modulated Contrast for Versatile Image Synthesis

Modulated Contrast for Versatile Image Synthesis

GLaMa: Joint Spatial and Frequency Loss for General Image Inpainting

ShowFace: Coordinated Face Inpainting with Memory-Disentangled Refinement Networks

Contact Info

Product

Resources

About