Multimodal Image Synthesis and Editing: A Survey

Zhan, Fangneng; Yu, Yingchen; Wu, Rongliang; Zhang, Jiahui; Lu, Shijian; Liu, Lingjie; Kortylewski, Adam; Theobalt, Christian; Xing, Eric P.

doi:10.48550/arxiv.2112.13592

Cited by 13 publications

(15 citation statements)

References 155 publications

(288 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meanwhile, the advances of deep learning provide powerful tools such as CNN to process image data. The image editing task aims to generate a new image from a source image by editing the contents of the source image under certain guidance while keeping other properties unchanged [265]. The input and output of the model for image editing are the images represented by the pixel matrix with multiple color channels.…”

Section: Image Editingmentioning

confidence: 99%

Controllable Data Generation by Deep Learning: A Review

Wang¹,

Du²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.

show abstract

Section: Image Editingmentioning

confidence: 99%

Controllable Data Generation by Deep Learning: A Review

Wang¹,

Du²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Image Generation Loss Image generation tasks entail various losses to achieve dedicated purposes in image synthesis [23,24,32,39,40,43,44,[47][48][49]. For instance, unpaired image translation is usually associated with certain losses to encourage correlation between the input and output images.…”

Section: Related Workmentioning

confidence: 99%

Modulated Contrast for Versatile Image Synthesis

Zhan¹,

Zhang²,

Yu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Perceiving the similarity between images has been a long-standing and fundamental problem underlying various visual generation tasks. Predominant approaches measure the inter-image distance by computing pointwise absolute deviations, which tends to estimate the median of instance distributions and leads to blurs and artifacts in the generated images. This paper presents MoNCE, a versatile metric that introduces image contrast to learn a calibrated metric for the perception of multifaceted inter-image distances. Unlike vanilla contrast which indiscriminately pushes negative samples from the anchor regardless of their similarity, we propose to re-weight the pushing force of negative samples adaptively according to their similarity to the anchor, which facilitates the contrastive learning from informative negative samples. Since multiple patch-level contrastive objectives are involved in image distance measurement, we introduce optimal transport in MoNCE to modulate the pushing force of negative samples collaboratively across multiple contrastive objectives. Extensive experiments over multiple image translation tasks show that the proposed MoNCE outperforms various prevailing metrics substantially. The code is available at MoNCE.

show abstract

“…Due to the superior generation capability, GAN-based image-to-image translation [25, 34-38, 44, 52] has been extensively investigated and achieved remarkable progress on translating different conditions such as semantic segmentation [10,25,32,41,43], key points [20,22,40,42] and edge maps [15,39,53].…”

Section: Image-to-image Translationmentioning

confidence: 99%

Marginal Contrastive Correspondence for Guided Image Generation

Zhan¹,

Yu²,

Wu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar (from two different domains) for leveraging detailed exemplar styles to achieve realistic image translation. Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains. Without explicit exploitation of domain-invariant features, this approach may not reduce the domain gap effectively which often leads to sub-optimal correspondences and image translation. We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation. Specifically, we design an innovative marginal contrastive loss that guides to establish dense correspondences explicitly. Nevertheless, building correspondence with domain-invariant semantics alone may impair the texture patterns and lead to degraded texture generation. We thus design a Self-Correlation Map (SCM) that incorporates scene structures as auxiliary information which improves the built correspondences substantially. Quantitative and qualitative experiments on multifarious image translation tasks show that the proposed method outperforms the state-of-the-art consistently.

show abstract

Multimodal Image Synthesis and Editing: A Survey

Cited by 13 publications

References 155 publications

Controllable Data Generation by Deep Learning: A Review

Controllable Data Generation by Deep Learning: A Review

Modulated Contrast for Versatile Image Synthesis

Marginal Contrastive Correspondence for Guided Image Generation

Contact Info

Product

Resources

About