Deep Image Spatial Transformation for Person Image Generation

Ren, Yunpeng; Yu, Xiaoming; Chen, Junming; Li, Thomas; Li, Ge

doi:10.1109/cvpr42600.2020.00771

Cited by 181 publications

(191 citation statements)

References 23 publications

Supporting

Mentioning

175

Contrasting

Order By: Relevance

“…The performance may be vulnerable to parsing errors. Some other methods tackle this task by proposing efficient spatial transformation modules [17,24,26,27,29,33]. Siarohin et al [26] introduce deformable skip connections to spatially transform the source neural textures with a set of affine transformations.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, advances in computer vision fields have made tremendous progress in generating realistic images [2,7,13,14]. Some algorithms [19,21,24,26] are proposed to automatically synthesize person images from references using learning-based methods. Formally, the pose-guided person image synthesis task aims to synthesize person images by transforming the poses of reference images according to the given modifications while preserving the Session 25: Multimedia Art, Entertainment and Culture MM '21, October 20-24, 2021, Virtual Event, China reference identities.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Combining Attention with Flow for Person Image Synthesis

Ren

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

Pose-guided person image synthesis aims to synthesize person images by transforming reference images into target poses. In this paper, we observe that the commonly used spatial transformation blocks have complementary advantages. We propose a novel model by combining the attention operation with the flow-based operation. Our model not only takes the advantage of the attention operation to generate accurate target structures but also uses the flow-based operation to sample realistic source textures. Both objective and subjective experiments demonstrate the superiority of our model. Meanwhile, comprehensive ablation studies verify our hypotheses and show the efficacy of the proposed modules. Besides, additional experiments on the portrait image editing task demonstrate the versatility of the proposed combination.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Combining Attention with Flow for Person Image Synthesis

Ren

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

Self Cite

View full text Add to dashboard Cite

show abstract

“…At an early stage, multiple deep-learning structures, such as GAN [30] and variational autoencoders (VAE) [31], were attempted. Because features in a real scene are difficult to represent with a latent tensor of fixed length, recent models have been based on a GAN with an attention mechanism in the generator [32], [33]. In addition, adversarial learning has also shown its effectiveness with many other tasks, such as aging faces [34], elevation data simulation [35], 3D imagebased unconditional geostatistical simulation [36], particle interaction [37] and ultrasonic image generation [38].…”

Section: Gan Models Related To Lucc Predictionmentioning

confidence: 99%

GAN-Based LUCC Prediction via the Combination of Prior City Planning Information and Land-Use Probability

Sun

Feng

et al. 2021

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

“…On this basis, Men et al [64] put forward a new network architecture with style block connections and a human parser to separate the attributes and encode them respectively. In order to deal with person image spatial transformation problems, Ren et al [65] combine flow-based operations with attention mechanisms and the model consists of a Global Flow Field Estimator and a Local Neural Texture Renderer. Furthermore, [66], [67] also use an unsupervised manner to tackle this task via end-to-end training.…”

Section: Pose-guided Person Image Generationmentioning

confidence: 99%

Supervised Video-to-Video Synthesis for Single Human Pose Transfer

Wang

Huang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

In this paper, we focus on human pose transfer in different videos, i.e., transferring the dance pose of a person in given video to a target person in the other video. Our methods can be summed up in three stages to tackle this challenging scenario. Firstly, we extract the frames and pose masks from the source video and target video. Secondly, we use our model to synthesize the frames of target person with the given dance pose. Thirdly, we refine the generated frames to improve the quality of outputs. Our model is built on three stages: 1) human pose extraction and normalization. 2) a GAN based on cross-domain correspondence mechanism to synthesize dance-guided person image in target video by consecutive frames and pose stick images. 3) coarse-to-fine generation strategy which includes two GANs: a GAN used to reconstruct human face in target video, the other generates smoothing frame sequences. Finally, we compress the sequential frames generated from our model into video format. Compared with previous works, our model manifests better person appearance consistency and time coherence in video-to-video synthesis for human motion transfer, which makes the generated video look more realistic. The qualitative and quantitative comparisons represent our approach performs significant improvements over the state-of-the-art methods. Experiments on synthetic frames and ground truth validate the effectiveness of the proposed method. INDEX TERMS Generative adversarial network (GAN), image-to-image translation, video-to-video synthesis, pose-guided person image generation

show abstract

Deep Image Spatial Transformation for Person Image Generation

Cited by 181 publications

References 23 publications

Combining Attention with Flow for Person Image Synthesis

Combining Attention with Flow for Person Image Synthesis

GAN-Based LUCC Prediction via the Combination of Prior City Planning Information and Land-Use Probability

Supervised Video-to-Video Synthesis for Single Human Pose Transfer

Contact Info

Product

Resources

About