2022
DOI: 10.1109/tcsvt.2021.3057267
|View full text |Cite
|
Sign up to set email alerts
|

Anatomy-Aware 3D Human Pose Estimation With Bone-Based Pose Decomposition

Abstract: In this work, we propose a new solution for 3D human pose estimation in videos. Instead of directly regressing the 3D joint locations, we draw inspiration from the human skeleton anatomy and decompose the task into bone direction prediction and bone length prediction, from which the 3D joint locations can be completely derived. Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time. This promotes us to develop effective techniques to utilize global information across… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
83
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 137 publications
(83 citation statements)
references
References 44 publications
(69 reference statements)
0
83
0
Order By: Relevance
“…In the work of Chen et al [83], they design a joint shift loss to ensure consistency between predicted bone lengths and directions.…”
Section: Regularizationmentioning
confidence: 99%
“…In the work of Chen et al [83], they design a joint shift loss to ensure consistency between predicted bone lengths and directions.…”
Section: Regularizationmentioning
confidence: 99%
“…A grounded spatial-temporal learning framework was proposed in [20] to leverage both the temporal context in the video sequence and the spatial information in the graph-based skeleton. In the light of exploring spatial-temporal learning, [21] use the entire video as the context for predicting the bone direction along with a consistent bone length across the entire video. [22] proposed a multi-step refinement and estimation framework that refines the 2D input keypoint sequence and then concurrently considering the structure of 2D inputs and 3D outputs.…”
Section: D Pose Estimationmentioning
confidence: 99%
“…We have evaluated the methods proposed in [2,1] which focus on 3D pose estimation from a single image. For a fair comparison, we also compete with methods that leverage a video sequence as input [6,5,20,21,14,49]. It is important to note that a recently proposed work [14] leverages extra data augmentation step to improve the 2D keypoint detection result, which further pushed the performance.…”
Section: Evaluation On Human36m Dataset 3d Pose Estimation In Videomentioning
confidence: 99%
See 1 more Smart Citation
“…This task is typically solved by dividing it into two decoupled subtasks, i.e., 2D pose detection to localize the keypoints on the image plane, followed by 2D-to-3D lifting to infer joint locations in the 3D space from 2D poses. Despite their impressive performance [4,9,28,34], it remains an inher-Figure 1. Given a frame with occluded body parts (right arm and elbow), a recent state-of-the-art 3D HPE method, PoseFormer [47], outputs a single solution that is inconsistent with the 2D input.…”
Section: Introductionmentioning
confidence: 99%