High-Fidelity Neural Human Motion Transfer from Monocular Video

Kappel, Moritz; Golyanik, Vladislav; Elgharib, Mohamed; Henningson, Jann-Ole; Seidel, Hans‐Peter; Castillo, Susana; Theobalt, Christian; Magnor, Marcus

doi:10.1109/cvpr46437.2021.00159

Cited by 24 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many approaches formulate the motion transfer problem as an image-to-image translation task. Kappel et al [15] divided the image translation task into four cascaded generative networks and proposed a structure network to learn wrinkles of garments, which generates high-quality results. Zhang et al [49] proposed a decoupled GAN to disentangle the shape and texture of clothing.…”

Section: Novel View/pose Synthesismentioning

confidence: 99%

“…With the corresponding UV-map, the geometry is rasterized using a neural texture by bilinear sampling and then is translated to an RGB image using a neural network. We compare our method with three state-of-the-art methods Neural Body [33], HF-NHMT [15] and StylePeople [12]. The trained models of [33] and [15] are generated by the official implementations, and the trained models of [12] on 20 videos of SelfieVideo are provided by the authors.…”

Section: Applicationsmentioning

confidence: 99%

See 1 more Smart Citation

High-Fidelity Human Avatars from a Single RGB Camera

Zhao

Zhang

Lai

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

In this paper, we propose a coarse-to-fine framework to reconstruct a personalized high-fidelity human avatar from a monocular video. To deal with the misalignment problem caused by the changed poses and shapes in different frames, we design a dynamic surface network to recover pose-dependent surface deformations, which help to decouple the shape and texture of the person. To cope with the complexity of textures and generate photo-realistic results, we propose a reference-based neural rendering network and exploit a bottom-up sharpening-guided finetuning strategy to obtain detailed textures. Our framework also enables photo-realistic novel view/pose synthesis and shape editing applications. Experimental results on both the public dataset and our collected dataset demonstrate that our method outperforms the state-of-theart methods. The code and dataset will be available at http://cic.tju.edu.cn/faculty/likun/projects/HF-Avatar.

show abstract

Section: Novel View/pose Synthesismentioning

confidence: 99%

Section: Applicationsmentioning

confidence: 99%

High-Fidelity Human Avatars from a Single RGB Camera

Zhao

Zhang

Lai

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Head puppetry or "talking head generation" is the task of generating a plausible video of a talking head from a source image or video by mimicking the movements and facial expressions of a reference video (Zakharov et al [2019]), while lip syncing consists in synchronizing lip movements on a video to match a target speech segment (Prajwal et al [2020]). Head puppetry and lip syncing are both forms of motion transfer, which refers more broadly to the task of mapping the motion of a given individual in source video to the motion of another individual in a target image or video (Zhu et al [2021], Kappel et al [2021]). Face swapping, head puppetry, and lip syncing are commonly referred to as "deepfakes" because they can be used to usurp someone's identity in a video; however, they involve distinct generation pipelines.…”

Section: Local Partially Synthetic Dlsammentioning

confidence: 99%

Deep Learning and Synthetic Media

Millière¹

2022

Preprint

View full text Add to dashboard Cite

Deep learning algorithms are rapidly changing the way in which audiovisual media can be produced. Synthetic audiovisual media generated with deep learning -often subsumed colloquially under the label "deepfakes" -have a number of impressive characteristics; they are increasingly trivial to produce, and can be indistinguishable from real sounds and images recorded with a sensor. Much attention has been dedicated to ethical concerns raised by this technological development. Here, I focus instead on a set of issues related to the notion of synthetic audiovisual media, its place within a broader taxonomy of audiovisual media, and how deep learning techniques differ from more traditional approaches to media synthesis. After reviewing important etiological features of deep learning pipelines for media manipulation and generation, I argue that "deepfakes" and related synthetic media produced with such pipelines do not merely offer incremental improvements over previous methods, but challenge traditional taxonomical distinctions, and pave the way for genuinely novel kinds of audiovisual media.

show abstract

“…[11,34] or videos of real people e.g. [23,14,20]. Some works explored transferring the style between rigid 3D objects.…”

Section: Related Workmentioning

confidence: 99%

Neural Human Deformation Transfer

Basset,

Boukhayma,

Wuhrer

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider the problem of human deformation transfer, where the goal is to retarget poses between different characters. Traditional methods that tackle this problem require a clear definition of the pose, and use this definition to transfer poses between characters. In this work, we take a different approach and transform the identity of a character into a new identity without modifying the character's pose. This offers the advantage of not having to define equivalences between 3D human poses, which is not straightforward as poses tend to change depending on the identity of the character performing them, and as their meaning is highly contextual. To achieve the deformation transfer, we propose a neural encoder-decoder architecture where only identity information is encoded and where the decoder is conditioned on the pose. We use pose independent representations, such as isometryinvariant shape characteristics, to represent identity features. Our model uses these features to supervise the prediction of offsets from the deformed pose to the result of the transfer. We show experimentally that our method outperforms state-of-the-art methods both quantitatively and qualitatively, and generalises better to poses not seen during training. We also introduce a fine-tuning step that allows to obtain competitive results for extreme identities, and allows to transfer simple clothing.

show abstract

High-Fidelity Neural Human Motion Transfer from Monocular Video

Cited by 24 publications

References 26 publications

High-Fidelity Human Avatars from a Single RGB Camera

High-Fidelity Human Avatars from a Single RGB Camera

Deep Learning and Synthetic Media

Neural Human Deformation Transfer

Contact Info

Product

Resources

About