2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01584
|View full text |Cite
|
Sign up to set email alerts
|

Graph Stacked Hourglass Networks for 3D Human Pose Estimation

Abstract: In this paper, we propose a novel graph convolutional network architecture, Graph Stacked Hourglass Networks, for 2D-to-3D human pose estimation tasks. The proposed architecture consists of repeated encoder-decoder, in which graph-structured features are processed across three different scales of human skeletal representations. This multiscale architecture enables the model to learn both local and global feature representations, which are critical for 3D human pose estimation. We also introduce a multi-level f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
63
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 144 publications
(63 citation statements)
references
References 44 publications
(79 reference statements)
0
63
0
Order By: Relevance
“…They are also superior to RGB data in terms of computational cost. Due to these advantages and the progress of sensors (Zhang 2012) and pose estimation models (Wandt et al 2021;Xu and Takano 2021), various models have been proposed (Yan, Xiong, and Lin 2018;Zhang et al 2020;Cheng et al 2021;Kong, Deng, and Jiang 2021).…”
Section: Related Workmentioning
confidence: 99%
“…They are also superior to RGB data in terms of computational cost. Due to these advantages and the progress of sensors (Zhang 2012) and pose estimation models (Wandt et al 2021;Xu and Takano 2021), various models have been proposed (Yan, Xiong, and Lin 2018;Zhang et al 2020;Cheng et al 2021;Kong, Deng, and Jiang 2021).…”
Section: Related Workmentioning
confidence: 99%
“…The probability for a pixel to be the keypoint can be measured by its response in the heatmap. Recently, heatmap-based approaches have achieved the state-of-the-art performance in pose estimation [32,4,34,27]. The coordinates of keypoints are obtained by decoding the heatmaps [25].…”
Section: Related Work and Major Contributionsmentioning
confidence: 99%
“…[4] predicted scale-aware high-resolution heatmaps using multi-resolution aggregation during inference. [34] processed graph-structured features across multi-scale human skeletal representations and proposed a learning approach for multi-level feature learning and heatmap estimation.…”
Section: Related Work and Major Contributionsmentioning
confidence: 99%
“…Given the 3D supervision, the spatial relationship can be directly learned via supervised learning [6,20,55,59]. Various representations have been proposed to effectively encode the spatial relationship such as volumetric representation [43], graph structure [4,11,71,76], transformer architecture [31,34,74], compact designs for realtime reconstruction [37,38], and inverse kinematics [30]. These supervised learning approaches that rely on the 3D ground truth supervision, however, show limited generalization to images of out-of-distribution scenes and poses due to the domain gap.…”
Section: Related Workmentioning
confidence: 99%