2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00742
|View full text |Cite
|
Sign up to set email alerts
|

Cascaded Pyramid Network for Multi-person Pose Estimation

Abstract: The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: Glo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
949
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 1,289 publications
(958 citation statements)
references
References 38 publications
3
949
1
Order By: Relevance
“…RefineNet [70] improves the combination of upsampled representations and the representations of the same resolution copied from the downsample process. Other works include: light upsample process [5], [19], [72], [124], possibly with dilated convolutions used in the backbone [47], [69], [91]; light downsample and heavy upsample processes [115], recombinator networks [40]; improving skip connections with more or complicated convolutional units [48], [89], [143], as well as sending information from low-resolution skip connections to highresolution skip connections [151] or exchanging information between them [34]; studying the details of the upsample process [120]; combining multi-scale pyramid representations [18], [125]; stacking multiple DeconvNets/U-Nets/Hourglass [31], [122] with dense connections [110].…”
Section: Related Workmentioning
confidence: 99%
“…RefineNet [70] improves the combination of upsampled representations and the representations of the same resolution copied from the downsample process. Other works include: light upsample process [5], [19], [72], [124], possibly with dilated convolutions used in the backbone [47], [69], [91]; light downsample and heavy upsample processes [115], recombinator networks [40]; improving skip connections with more or complicated convolutional units [48], [89], [143], as well as sending information from low-resolution skip connections to highresolution skip connections [151] or exchanging information between them [34]; studying the details of the upsample process [120]; combining multi-scale pyramid representations [18], [125]; stacking multiple DeconvNets/U-Nets/Hourglass [31], [122] with dense connections [110].…”
Section: Related Workmentioning
confidence: 99%
“…The main focus of this work is to build a pose-aware relation classifier for predicting the relation score s a h,o given a x h , x o pair. To achieve this, we first apply an off-the-shelf pose estimator [3] to a cropped region of proposal x h , which generates a pose vector p h = {p 1 h , ..., p K h }, where p k h ∈ R 2 is k-th joint location and K is the number of all joints. In order to incorporate interaction context, human-object and detailed semantic part cues into relation inference, we then introduce a multi-branch deep neural network to generate Zoom-in module uses human part information and attention mechanism to capture more details.…”
Section: Overviewmentioning
confidence: 99%
“…3.2 in an end-to-end manner. Note that the object detector (Faster R-CNN [21]) and the pose estimator (CPN [3]) are external modules and thus do not participate in learning process.…”
Section: Model Learningmentioning
confidence: 99%
“…Top-down approaches (Fang et al, 2017;Huang et al, 2017;Papandreou et al, 2017;Chen et al, 2018) are opposed to the former, locating and partitioning all persons in the image followed by utilizing single person pose estimation caches individually for each person. Cascaded Pyramid Network (CPN) (Chen et al, 2018) takes two steps to cope with overlapping or obscured keypoints: GlobalNet for easy recognized keypoints and RefineNet for hard one. Papandreou et al (2017) leverages the Faster RCNN (Ren et al, 2015) as the person detector and the fully convolutional ResNet to predict heatmaps and offsets.…”
Section: Top-down Approachesmentioning
confidence: 99%