2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01012
|View full text |Cite
|
Sign up to set email alerts
|

Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network

Abstract: 3D human pose estimation from a monocular image or 2D joints is an ill-posed problem because of depth ambiguity and occluded joints. We argue that 3D human pose estimation from a monocular input is an inverse problem where multiple feasible solutions can exist. In this paper, we propose a novel approach to generate multiple feasible hypotheses of the 3D pose from 2D joints. In contrast to existing deep learning approaches which minimize a mean square error based on an unimodal Gaussian distribution, our method… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
112
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 171 publications
(120 citation statements)
references
References 27 publications
1
112
0
Order By: Relevance
“…and ambiguity. On the task of 3D human pose estimation from a single RGB image, [11] generated multiple feasible hypotheses of 3D pose from 2D joints to alleviate the problems resulting from depth ambiguity and occluded joints. Following [7], [31] built up proposal propagation tree to maintain multiple object proposal hypotheses for each object in time steps to perform data association globally in semi-supervised video object segmentation task.…”
Section: Related Workmentioning
confidence: 99%
“…and ambiguity. On the task of 3D human pose estimation from a single RGB image, [11] generated multiple feasible hypotheses of 3D pose from 2D joints to alleviate the problems resulting from depth ambiguity and occluded joints. Following [7], [31] built up proposal propagation tree to maintain multiple object proposal hypotheses for each object in time steps to perform data association globally in semi-supervised video object segmentation task.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of using relative position between of the 3D joints and the root joint (pelvis) as a ground truth, they have shown that using the relative positions with respect to multiple joints improves their learning. Li and Lee [38] learn a mixture density network [6] to generate multiple possible 3D poses' hypotheses from a single monocular image. Another approach that generates the hypotheses is a deep pose consensus approach [10].…”
Section: Deep Learningmentioning
confidence: 99%
“…Another approach that generates the hypotheses is a deep pose consensus approach [10]. In contrast to [38], it generates partial hypotheses, for each group of the joints. The estimated joints are aggregated into poses in the final part of the model.…”
Section: Deep Learningmentioning
confidence: 99%
“…In [1], the authors co-embed the language and corresponding motions to a share manifold, ignoring the fact that language-to-motion is a one-to-many mapping. Even with a specific control signal, like 2D human skeleton, one can still expect that there are different motions or different pose corresponding to the same control signal [29], essentially indicating the multi-modality nature of human motion dynamics.…”
Section: Deterministic Human Motion Prediction and Synthesismentioning
confidence: 99%