2018
DOI: 10.1007/978-3-030-01216-8_44
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Scale Structure-Aware Network for Human Pose Estimation

Abstract: We develop a robust multi-scale structure-aware neural network for human pose estimation. This method improves the recent deep conv-deconv hourglass models with four key improvements: (1) multiscale supervision to strengthen contextual feature learning in matching body keypoints by combining feature heatmaps across scales, (2) multiscale regression network at the end to globally optimize the structural matching of the multi-scale features, (3) structure-aware loss used in the intermediate supervision and at th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
142
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 194 publications
(150 citation statements)
references
References 21 publications
2
142
0
Order By: Relevance
“…An upsample process can be used to gradually recover the high-resolution representations from the low-resolution representations. The upsample subnetwork could be a symmetric version of the downsample process (e.g., VGGNet), with skipping connection over some mirrored layers to transform the pooling indices, e.g., SegNet [3] and DeconvNet [85], or copying the feature maps, e.g., U-Net [95] and Hourglass [6], [7], [21], [24], [51], [83], [109], [131], [132], encoder-decoder [90], and so on. An extension of U-Net, full-resolution residual network [92], introduces an extra full-resolution stream that carries information at the full image resolution, to replace the skip connections, and each unit in the downsample and upsample subnetworks receives information from and sends information to the full-resolution stream.…”
Section: Related Workmentioning
confidence: 99%
“…An upsample process can be used to gradually recover the high-resolution representations from the low-resolution representations. The upsample subnetwork could be a symmetric version of the downsample process (e.g., VGGNet), with skipping connection over some mirrored layers to transform the pooling indices, e.g., SegNet [3] and DeconvNet [85], or copying the feature maps, e.g., U-Net [95] and Hourglass [6], [7], [21], [24], [51], [83], [109], [131], [132], encoder-decoder [90], and so on. An extension of U-Net, full-resolution residual network [92], introduces an extra full-resolution stream that carries information at the full image resolution, to replace the skip connections, and each unit in the downsample and upsample subnetworks receives information from and sends information to the full-resolution stream.…”
Section: Related Workmentioning
confidence: 99%
“…Human pose estimation is a problem of localizing human body part locations in an input image. Most of the current works [34,10,45,46,28,42] use a deep convolutional neural network and generate the output as a 2D heatmap, which is encoded as a gaussian map centered at each body part location. Hourglass network [34] exploits the iterative refinements on the predictions from the repeated encoder-decoder architecture design to capture complex spatial relationships.…”
Section: Related Workmentioning
confidence: 99%
“…Even with deep ar-chitectures, disambiguating look-alike body parts remain as a main problem [39] in pose estimation community. Recent methods [46,11,28], built on top of the hourglass network, use multi-scale and body part structure information to improve the performance by adding more architectural components.…”
Section: Related Workmentioning
confidence: 99%
“…The intermediate supervision at each hourglass module benefits from previous module outputs, refining and improving final network predictions. Given its high performance, its conceptual simplicity, and that allows for an easy multitask integration among stacked modules, this architecture is serving as a baseline model in several works [30], [31], [32], [33], [34].…”
Section: A Multi-task Architecturementioning
confidence: 99%