C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer

Wei, Dongxu; Xu, Xiaowei; Shen, Haibin; Huang, Kejie

doi:10.1609/aaai.v35i4.16391

Cited by 12 publications

(21 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• A Texture Alignment Module is proposed to align the features of the source image and the initially generated image to preserve more details like textures of clothes and edges of the bodies. We conduct extensive experiments on the iPER Dataset [25] and SoloDance Dataset [50]. Experimental results show that our model achieves the state-of-the-art both quantitatively and qualitatively.…”

Section: Introductionmentioning

confidence: 95%

“…However, the SMPL model [32] was only suitable for smooth human bodies, it cannot represent the human body with complex clothes. Different from warping the feature of the source image, C2F [50] estimated the optical flow of the clothing regions and directly warped the clothing region according to optical flow. Unfortunately, when the driving pose is greatly different from the source pose, such methods usually failed due to inaccurate flow estimations.…”

Section: Human Video Motion Transfermentioning

confidence: 99%

“…The task of Human video Motion Transfer (HVMT) is to synthesize a realistic video where a source person performs desired driving motion [33,50]. Specifically, given the source image and a driving video, it needs to make the person in the source image imitate the action of the person in the driving video.…”

Section: Introductionmentioning

confidence: 99%

“…To overcome the huge data requirement of individual subjects and improve the generalization of personalized methods, recent works mainly focus on general-purpose methods [5,12,15,25,39,45,50]. This type of method aims to learn a model that can be adapted for the generation of unseen persons.…”

Section: Introductionmentioning

confidence: 99%

“…Second, the drastic variations of poses and viewpoints between the source person and the driving person bring great challenges to human motion transfer. Some recent methods adopt warping flows, such as the optical flow [50], the transform flow [25], etc., to achieve pose transfer, as shown in Figure 1 (b). Nevertheless, the generated frames still have significant artifacts caused by the errors of flow estimation.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer

Yang

Liu

et al. 2022

Proceedings of the 30th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Human Video Motion Transfer (HVMT) aims to, given an image of a source person, generate his/her video that imitates the motion of the driving person. Existing methods for HVMT mainly exploit Generative Adversarial Networks (GANs) to perform the warping operation based on the flow estimated from the source person image and each driving video frame. However, these methods always generate obvious artifacts due to the dramatic differences in poses, scales, and shifts between the source person and the driving person. To overcome these challenges, this paper presents a novel REgionto-whole human MOtion Transfer (REMOT) framework based on GANs. To generate realistic motions, the REMOT adopts a progressive generation paradigm: it first generates each body part in the driving pose without flow-based warping, then composites all parts into a complete person of the driving motion. Moreover, to preserve the natural global appearance, we design a Global Alignment Module to align the scale and position of the source person with those of the driving person based on their layouts. Furthermore, we propose a Texture Alignment Module to keep each part of the person aligned according to the similarity of the texture. Finally, through extensive quantitative and qualitative experiments, our REMOT achieves state-of-the-art results on two public benchmarks.

show abstract

Section: Introductionmentioning

confidence: 95%

Section: Human Video Motion Transfermentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%