2016
DOI: 10.1007/978-3-319-46478-7_44
|View full text |Cite
|
Sign up to set email alerts
|

Human Pose Estimation via Convolutional Part Heatmap Regression

Abstract: This paper is on human pose estimation using Convolutional Neural Networks. Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. To this end, we propose a detection-followed-by-regression CNN cascade. The first part of our cascade outputs part detection heatmaps and the second part performs regression on these heatmaps. The benefits of the proposed architecture are… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
344
0
1

Year Published

2016
2016
2018
2018

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 415 publications
(345 citation statements)
references
References 40 publications
0
344
0
1
Order By: Relevance
“…Recently, methods based on CNNs have been shown to produce stateof-the-art results for many Computer Vision tasks like image recognition [23], object detection [11] and semantic image segmentation [18]. In the context of landmark localisation, it is natural to formulate the problem as a regression one in which CNN features are regressed in order to provide a joint prediction of the landmarks, see for example recent works on human pose estimation [3,5,20,25]. The idea of joint regression of part detection scoremaps for localisation has been explored in [5], however in the context of human pose estimation.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, methods based on CNNs have been shown to produce stateof-the-art results for many Computer Vision tasks like image recognition [23], object detection [11] and semantic image segmentation [18]. In the context of landmark localisation, it is natural to formulate the problem as a regression one in which CNN features are regressed in order to provide a joint prediction of the landmarks, see for example recent works on human pose estimation [3,5,20,25]. The idea of joint regression of part detection scoremaps for localisation has been explored in [5], however in the context of human pose estimation.…”
Section: Related Workmentioning
confidence: 99%
“…Single person pose estimation in images has seen a remarkable progress over the past few years [39,30,6,40,14,16,27,5,32]. However, all these approaches assume that only a single person is visible in the image, and cannot handle realistic cases where several people appear in the scene, and interact with each other.…”
Section: Related Workmentioning
confidence: 99%
“…The field of human pose estimation in images has progressed remarkably over the past few years. The methods have advanced from pose estimation of single pre-localized persons [30,6,40,14,16,27,5,32] to the more challenging and realistic case of multiple, potentially overlapping and truncated persons [12,8,30,16,17]. Many applications, such as mentioned before, however, aim to analyze human body motion over time.…”
Section: Introductionmentioning
confidence: 99%
“…This is mainly due to the availability of deep learning based methods for detecting joints [1][2][3][4][5]. While earlier approaches in this direction [4,6,7] combine the body part detectors with tree structured graphical models, more recent methods [1][2][3][8][9][10] demonstrate that spatial relations between joints can be directly learned by a neural network without the need of an additional graphical model. These approaches, however, assume that only a single person is visible in the image and the location of the person is known a-priori.…”
Section: Introductionmentioning
confidence: 99%
“…In [3,8,9] multi-staged CNN architectures are proposed where each stage of the network takes as input the score maps of all parts from its preceding stage. This provides additional information about the interdependence, co-occurrence, and context of parts to each stage, and thereby allows the network to implicitly learn image dependent spatial relationships between parts.…”
Section: Related Workmentioning
confidence: 99%