Dual Attention GANs for Semantic Image Synthesis

Tang, Hao; Bai, Song; Sebe, Nicu

doi:10.1145/3394171.3416270

Cited by 65 publications

(36 citation statements)

References 84 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After summing with the temporal embedding E t ∈ R T ×C×V , the resulting feature vector Z is sent to the AniFormer encoder. In our case, original batch normalization layers are replaced with Instance Normalization [29] layers to preserve the instance style [15,19,27,28]. AniFormer Encoder.…”

Section: Aniformer: Transformer-based Network For 3d Animationmentioning

confidence: 99%

AniFormer: Data-driven 3D Animation with Transformer

Chen¹,

Tang²,

Sebe³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present a novel task, i.e., animating a target 3D object through the motion of a raw driving sequence. In previous works, extra auxiliary correlations between source and target meshes or intermedia factors are inevitable to capture the motions in the driving sequences. Instead, we introduce AniFormer, a novel Transformer-based architecture, that generates animated 3D sequences by directly taking the raw driving sequences and arbitrary same-type target meshes as inputs. Specifically, we customize the Transformer architecture for 3D animation that generates mesh sequences by integrating styles from target meshes and motions from the driving meshes. Besides, instead of the conventional single regression head in the vanilla Transformer, AniFormer generates multiple frames as outputs to preserve the sequential consistency of the generated meshes. To achieve this, we carefully design a pair of regression constraints, i.e., motion and appearance constraints, that can provide strong regularization on the generated mesh sequences. Our AniFormer achieves high-fidelity, realistic, temporally coherent animated results and outperforms compared start-of-the-art methods on benchmarks of diverse categories. Code is available: https://github.com/mikecheninoulu/AniFormer.

show abstract

Section: Aniformer: Transformer-based Network For 3d Animationmentioning

confidence: 99%

AniFormer: Data-driven 3D Animation with Transformer

Chen¹,

Tang²,

Sebe³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Generative Adversarial Networks (GANs). Over the last few years, GANs [16] have been shown effectively in many image generation and translation tasks [18,24,37,[39][40][41][42][43][44]57]. For example, Isola et al [18] propose Pix2Pix adversarial learning framework for paired image generation.…”

Section: Related Workmentioning

confidence: 99%

Cross-View Exocentric to Egocentric Video Synthesis

Liu

Tang

Latapie

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Cross-view video synthesis task seeks to generate video sequences of one view from another dramatically different view. In this paper, we investigate the exocentric (third-person) view to egocentric (first-person) view video generation task. This is challenging because egocentric view sometimes is remarkably different from the exocentric view. Thus, transforming the appearances across the two different views is a non-trivial task. Particularly, we propose a novel Bi-directional Spatial Temporal Attention Fusion Generative Adversarial Network (STA-GAN) to learn both spatial and temporal information to generate egocentric video sequences from the exocentric view. The proposed STA-GAN consists of three parts: temporal branch, spatial branch, and attention fusion. First, the temporal and spatial branches generate a sequence of fake frames and their corresponding features. The fake frames are generated in both downstream and upstream directions for both temporal and spatial branches. Next, the generated four different fake frames and their corresponding features (spatial and temporal branches in two directions) are fed into a novel multi-generation attention fusion module to produce the final video sequence. Meanwhile, we also propose a novel temporal and spatial dual-discriminator for more robust network optimization. Extensive experiments on the Side2Ego and Top2Ego datasets [11] show that the proposed STA-GAN significantly outperforms the existing methods.

show abstract

“…Pix2pixHD [42] improves Pix2Pix by proposing coarseto-fine generator and discriminators. Subsequent meth-ods [32,27,39,46,37,51] further explore how to synthesize high quality images from semantic masks and achieve significant improvements. Besides using class-level semantic masks, some works also consider instance-level information for image synthesis, since the semantic mask itself does not provide sufficient information to synthesize instances especially in complex environments with multiple of them interacting with each other.…”

Section: Conditional Image Synthesismentioning

confidence: 99%

Diverse Semantic Image Synthesis via Probability Distribution Modeling

Tan¹,

Chai²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Semantic image synthesis, translating semantic layouts to photo-realistic images, is a one-to-many mapping problem. Though impressive progress has been recently made, diverse semantic synthesis that can efficiently produce semantic-level multimodal results, still remains a challenge. In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level. We achieve this by modeling class-level conditional modulation parameters as continuous probability distributions instead of discrete values, and sampling per-instance modulation parameters through instance-adaptive stochastic sampling that is consistent across the network. Moreover, we propose prior noise remapping, through linear perturbation parameters encoded from paired references, to facilitate supervised training and exemplar-based instance style control at test time. Extensive experiments on multiple datasets show that our method can achieve superior diversity and comparable quality compared to state-of-the-art methods. Code will be available at https://github.com/tzt101/ INADE.git

show abstract

Dual Attention GANs for Semantic Image Synthesis

Cited by 65 publications

References 84 publications

AniFormer: Data-driven 3D Animation with Transformer

AniFormer: Data-driven 3D Animation with Transformer

Cross-View Exocentric to Egocentric Video Synthesis

Diverse Semantic Image Synthesis via Probability Distribution Modeling

Contact Info

Product

Resources

About