2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01124
|View full text |Cite
|
Sign up to set email alerts
|

ChallenCap: Monocular 3D Capture of Challenging Human Performances using Multi-Modal References

Abstract: Capturing challenging human motions is critical for numerous applications, but it suffers from complex motion patterns and severe self-occlusion under the monocular setting. In this paper, we propose ChallenCap -a template-based approach to capture challenging 3D human motions using a single RGB camera in a novel learning-and-optimization framework, with the aid of multi-modal references. We propose a hybrid motion inference stage with a generation network, which utilizes a temporal encoder-decoder to extract … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(9 citation statements)
references
References 77 publications
0
9
0
Order By: Relevance
“…Besides, TCMR [114] applies two more GRUs to forecast additional temporal features for the current target pose from the past and future frames. Based on GRUs and a motion discriminator, ChallenCap [198] utilizes multi-modal and hybrid motion references to capture challenging human motions. Lee et al [147] consider the uncertainty-aware embedding and include optical flow information.…”
Section: Recovery From Monocular Videosmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides, TCMR [114] applies two more GRUs to forecast additional temporal features for the current target pose from the past and future frames. Based on GRUs and a motion discriminator, ChallenCap [198] utilizes multi-modal and hybrid motion references to capture challenging human motions. Lee et al [147] consider the uncertainty-aware embedding and include optical flow information.…”
Section: Recovery From Monocular Videosmentioning
confidence: 99%
“…MPoser, an extension of VPoser [22] to temporal sequences, is based on sequential VAE. Inspired by VIBE, He et al [198] generate marker-based motion maps as input to a discriminator to obtain an adversarial motion prior. In HuMoR [158], the probability distribution of possible state transitions is formulated by a conditional variational autoencoder (CVAE).…”
Section: Motion Priormentioning
confidence: 99%
“…There are many real-time motion capture solutions available and we adopt the recent single camera technique [He et al 2021] for convenience. It is able to detect 21 key points of skeletons.…”
Section: D Vs 3d Renderingmentioning
confidence: 99%
“…In this specific case, a user should not only be able to omnidirectionally watch the virtual trainer's moves but also compare their own moves with the trainer. In our implementation, we use a single camera motion capture solution [He et al 2021] that estimates 3D skeleton structures of users as they move. We also precompute the "ground truth" skeleton moves of the trainer, by first rendering a multiview video of whole body movements also using NeuVV and then conducting multi-view skeleton estimation.…”
Section: Ground Truthmentioning
confidence: 99%
“…To capture humans in challenging poses and motion, ChallenCap [7] uses a learning-and-optimization framework, which learns motion characteristics and is trained on "a new challenging human motion dataset with both unsynchronized marker-based and light-weight multiimage references" [7]. First, a noisy skeletal motion map is acquired, this is then processed in their temporal encoder-decoder and generation network HybridNet and their motion discriminator, which is followed by a robust motion optimization phase, which uses 2D keypoint and silhouette information from the video frames.…”
Section: Fitting a Body Model To Image Sequencesmentioning
confidence: 99%