Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking

Sharma, Saurabh; Varigonda, Pavan Teja; Bindal, Prashast; Sharma, Abhishek; Jain, Arjun

doi:10.1109/iccv.2019.00241

Cited by 140 publications

(78 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PA MPJPE Chen et al [11] 82.7 Moreno et al [32] 76.5 Zhou et al [56] 55.3 Sun et al [47] 48.3 Sharma et al [44] 40.9 Sun et al [48] 40.6 Moon et al [31] 34.0 Baseline * 34.7 Baseline 2 * * 34.3 ours 33.1 Table A1. 3-D human pose estimation evaluation on the Hu-man3.6M dataset using Protocol I.…”

Section: Methodsmentioning

confidence: 99%

MEBOW: Monocular Estimation of Body Orientation in the Wild

Chen

Luo

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Body orientation estimation provides crucial visual cues in many applications, including robotics and autonomous driving. It is particularly desirable when 3-D pose estimation is difficult to infer due to poor image resolution, occlusion, or indistinguishable body parts. We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image. The body-orientation labels for around 130K human bodies within 55K images from the COCO dataset have been collected using an efficient and high-precision annotation pipeline. We also validated the benefits of the dataset. First, we show that our dataset can substantially improve the performance and the robustness of a human body orientation estimation model, the development of which was previously limited by the scale and diversity of the available training data. Additionally, we present a novel triple-source solution for 3-D human pose estimation, where 3-D pose labels, 2-D pose labels, and our body-orientation labels are all used in joint training. Our model significantly outperforms state-of-the-art dual-source solutions for monocular 3-D human pose estimation, where training only uses 3-D pose labels and 2-D pose labels. This substantiates an important advantage of MEBOW for 3-D human pose estimation, which is particularly appealing because the per-instance labeling cost for body orientations is far less than that for 3-D poses. The work demonstrates high potential of MEBOW in addressing real-world challenges involving understanding human behaviors. Further information of this work is available at https://chenyanwu.github.io/MEBOW/ .

show abstract

Section: Methodsmentioning

confidence: 99%

MEBOW: Monocular Estimation of Body Orientation in the Wild

Chen

Luo

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…At the same time, some works use a deep CNN [ 27 , 28 ]. (iii) The deep learning-based approaches [ 12 , 14 , 27 , 28 , 31 , 32 ] which do not rely on hand-crafted features/descriptors but learn features and mapping to 3D human poses directly. (iv) There also exist hybrid approaches [ 6 , 33 , 34 ] that combine together the generative as well as discriminative methods.…”

Section: Related Workmentioning

confidence: 99%

“…Hence, to meet the massive demand for MoCap data, many research works have been performed to infer 3D human poses from internet-based in-the-wild real 2D images/videos [ 7 , 8 , 9 , 10 , 11 , 12 , 13 ]. Due to the curse of dimensionality and the ill-posed nature [ 14 ], there are open challenges connected to lifting 2D poses up to 3D poses. 3D human motion capturing from in-the-wild 2D pictures and videos will empower many vision-dependent applications such as health rehabilitation-based industries, robotics, virtual reality, entertainment, surveillance systems, and human-computer interaction [ 15 ].…”

Section: Introductionmentioning

confidence: 99%

An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks

Yasin

Krüger²

2021

Sensors

View full text Add to dashboard Cite

We propose an efficient and novel architecture for 3D articulated human pose retrieval and reconstruction from 2D landmarks extracted from a 2D synthetic image, an annotated 2D image, an in-the-wild real RGB image or even a hand-drawn sketch. Given 2D joint positions in a single image, we devise a data-driven framework to infer the corresponding 3D human pose. To this end, we first normalize 3D human poses from Motion Capture (MoCap) dataset by eliminating translation, orientation, and the skeleton size discrepancies from the poses and then build a knowledge-base by projecting a subset of joints of the normalized 3D poses onto 2D image-planes by fully exploiting a variety of virtual cameras. With this approach, we not only transform 3D pose space to the normalized 2D pose space but also resolve the 2D-3D cross-domain retrieval task efficiently. The proposed architecture searches for poses from a MoCap dataset that are near to a given 2D query pose in a definite feature space made up of specific joint sets. These retrieved poses are then used to construct a weak perspective camera and a final 3D posture under the camera model that minimizes the reconstruction error. To estimate unknown camera parameters, we introduce a nonlinear, two-fold method. We exploit the retrieved similar poses and the viewing directions at which the MoCap dataset was sampled to minimize the projection error. Finally, we evaluate our approach thoroughly on a large number of heterogeneous 2D examples generated synthetically, 2D images with ground-truth, a variety of real in-the-wild internet images, and a proof of concept using 2D hand-drawn sketches of human poses. We conduct a pool of experiments to perform a quantitative study on PARSE dataset. We also show that the proposed system yields competitive, convincing results in comparison to other state-of-the-art methods.

show abstract

“…performed by 7 actors. Following, [6,7,4,12,13,11,22,19], we adopt a 17-joint skeleton, train on five subjects (S1, S5, S6, S7, S8), and test on two subjects (S9 and S11). Following [4], we apply same pre-processing to ground truth annotations.…”

Section: Datasetmentioning

confidence: 99%

“…Different hypothesis generation approaches like Bayesian framework, [10], Gaussian Mixture Model [11], Variational Autoencoder [12] have been proposed in recent years. However, end-to-end encoder-decoder network has not been explored much for hypothesis generation in 3D pose estimation problem.…”

Section: Introductionmentioning

confidence: 99%

Monocular 3D Human Pose Estimation by Multiple Hypothesis Prediction and Joint Angle Supervision

Panda

Mukherjee

2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Human pose estimation in 3D from monocular images is a challenging inverse problem due to ambiguity in lifting 2D projection to 3D space. In this article we have made three contributions in order to solve 3D pose estimation. First, a new DNN architecture is proposed to generate multiple feasible 3D pose hypotheses from a given image. Second, we generate weights for the proposed hypotheses using ordinal supervision. These weights are used to predict the final 3D pose from the generated hypotheses. Finally, we report a new regularizer to enforce that the predicted skeleton is consistent with the restriction of anthropomorphic constraints. We compare the results of our algorithm with other state-of-the art approaches on the Human 3.6m benchmark dataset. Our algorithm reports competitive results.

show abstract

Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking

Cited by 140 publications

References 24 publications

MEBOW: Monocular Estimation of Body Orientation in the Wild

MEBOW: Monocular Estimation of Body Orientation in the Wild

An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks

Monocular 3D Human Pose Estimation by Multiple Hypothesis Prediction and Joint Angle Supervision

Contact Info

Product

Resources

About