Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis

Shi, Yupeng; Liu, Xiao; Wei, Yuxiang; Wu, Zhongqin; Zuo, Wangmeng

doi:10.1109/cvpr52688.2022.01094

Cited by 18 publications

(7 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In supervised baseline models, the earlier CRN 29 and SIMS 35 are trained without using adversarial training. However, the GAN-based supervised baselines can be further subdivided into other [50][51][52][53] , normalization 22,[53][54][55][56][57][58] , attention 7,8,23,31,59,60 , and discriminator 30,32,34,61 according to the improvement direction.…”

Section: Experiments Experimental Settingsmentioning

confidence: 99%

UNet-like network fused Swin Transformer and CNN for Semantic Image Synthesis

Ke,

Luo,

Cai

2024

Preprint

View full text Add to dashboard Cite

Semantic image synthesis approaches has been dominated by the modelling of Convolutional Neural Networks (CNN). Due to the limitations of local perception, their performance improvement seems to have plateaued in recent years. To tackle this issue, we propose the TransUNet model, which is a UNet-like network fused Swin Transformer and CNN for semantic image synthesis. Photorealistic image synthesis conditional on the given semantic layout depends on the high-level semantics and the low-level positions. To improve the synthesis performance, we design a novel conditional residual fusion module for the model decoder to efficiently fuse the hierarchical feature maps extracted at different scales. Moreover, this module combines the opposition-based learning mechanism and the weight assignment mechanism for enhancing and attending the semantic information. Compared to pure CNN-based models, our TransUNet combines the local and global perceptions to better extract high- and low-level features and better fuse multi-scale features. We have conducted an extensive amount of comparison experiments, both in quantitative and qualitative terms, to validate the effectiveness of our proposed TransUNet model for semantic image synthesis. The outcomes illustrate that TransUNet distinctively outperforms the state-of-the-art model on three benchmark datasets (Citysacpes, ADE20K, and COCO-Stuff) including numerous real-scene images.

show abstract

Section: Experiments Experimental Settingsmentioning

confidence: 99%

UNet-like network fused Swin Transformer and CNN for Semantic Image Synthesis

Ke,

Luo,

Cai

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Generative Adversarial Networks (GANs) [7], trained in an adversarial way to achieve Nash Equilibrium, have been successfully employed to all sorts of image synthesis tasks such as image editing [1,4,7,9], image manipulation [11,13] and image synthesis [4,10,12,14]. With continuous improvements on GAN-based framework, optimization and regularization, the performances of image generation by GANs are becoming more realistic and efficient.…”

Section: Deep Generative Modelsmentioning

confidence: 99%

“…In this section, we evaluate the generated image quality of the proposed method by comparing it with some semantic image synthesis methods on the FID, mIoU and Accuracy metrics. We select five recent state-of-the-art methods: Pix2PixHD [1], OASIS [2], SPADE, SEAN and RESAIL [7], as the comparison methods. The comparative test is performed on Cityscapes datasets.…”

Section: Quantitative and Qualitative Comparisonsmentioning

confidence: 99%

“…However, for complex scenes, such as Automatic driving scenario, the semantic map of a single modality lacks the detailed description, and the generated results are often with lowquality and monotonousness. Although recent advances [4,7] of SIS realize region-style control by adding the realistic styles to each region according to its labels, such simply style injection at a single level is inefficient and unable to obtain the more style information to control style, which damages detailed generation furtherly.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Instance-level image synthesis method based on multi-scale style transformation

Yang¹,

Shao²,

Qin³

et al. 2023

Fourteenth International Conference on Graphics and Image Processing (ICGIP 2022)

View full text Add to dashboard Cite

Semantic image synthesis is to synthesize photorealistic images according to the given semantic layout. Existing methods try to build a single-scale style encoder based on semantic regions, which inject style simply based on a single level, are unable to extract rich style information. Especially for different instance objects in the same semantic region, single-scale networks tend to generate the same style and control style ineffectively. To cope with this issue, we propose Multi-Scale Instance-level image synthesis method (MSIN). In order to learn more discriminative representation from different feature levels in instance, a multi-scale style encoder is designed to extract more details instead of traditional single-scale style encoder, which adopts a "pyramid" structure to contact contextual information. In addition, to synthesize visually pleasing and photorealistic images, MSIN leverages the region-style fusion mechanism in adaptive normalization layer, which realizes instance-wise object-to-object multi-style generation simultaneously. Compared with the previous methods, our method can generate images with fine details and control style in instance object, whose semantics are more reasonable and diverse to different instance objects. The experimental results demonstrate the superiority of MSIN on dealing with semantic image synthesis tasks and outperforms existing methods in terms of instance objects and diverse generation.

show abstract

“…Such is the case for Generative Adversarial Networks (GANs) [12] and Variational Auto-Encoders (VAEs) [19]. In addition, some generative models allow for conditioning [24, 25,33,6,34,31], opening the door to models that provide data as a function of the user's query.…”

Section: Introductionmentioning

confidence: 99%

Can segmentation models be trained with fully synthetically generated data?

Fernández¹,

Pinaya²,

Borges³

et al. 2022

Preprint

View full text Add to dashboard Cite

In order to achieve good performance and generalisability, medical image segmentation models should be trained on sizeable datasets with sufficient variability. Due to ethics and governance restrictions, and the costs associated with labelling data, scientific development is often stifled, with models trained and tested on limited data. Data augmentation is often used to artificially increase the variability in the data distribution and improve model generalisability. Recent works have explored deep generative models for image synthesis, as such an approach would enable the generation of an effectively infinite amount of varied data, addressing the generalisability and data access problems. However, many proposed solutions limit the user's control over what is generated. In this work, we propose brainSPADE, a model which combines a synthetic diffusion-based label generator with a semantic image generator. Our model can produce fully synthetic brain labels on-demand, with or without pathology of interest, and then generate a corresponding MRI image of an arbitrary guided style. Experiments show that brainSPADE synthetic data can be used to train segmentation models with performance comparable to that of models trained on real data.

show abstract

Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis

Cited by 18 publications

References 22 publications

UNet-like network fused Swin Transformer and CNN for Semantic Image Synthesis

UNet-like network fused Swin Transformer and CNN for Semantic Image Synthesis

Instance-level image synthesis method based on multi-scale style transformation

Can segmentation models be trained with fully synthetically generated data?

Contact Info

Product

Resources

About