CVPR 2011 2011
DOI: 10.1109/cvpr.2011.5995316
|View full text |Cite
|
Sign up to set email alerts
|

Real-time human pose recognition in parts from single depth images

Abstract: We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D propo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
2,322
0
38

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 3,215 publications
(2,368 citation statements)
references
References 29 publications
6
2,322
0
38
Order By: Relevance
“…Also during our experiments we observed that a curtain, the operating bed, a server rack and a stand for an ultrasound device were infrequently detected as a person. To overcome these shortcomings, in future iterations the Microsoft Kinect V2 camera with the Microsoft algorithms [26] will be used. The results show that moving persons in the operating theater can be detected in very short time; however, it has to be taken into account that the data are adjusted using the delay (rounded 1 s) resulting in higher values for detection time.…”
Section: Discussionmentioning
confidence: 99%
“…Also during our experiments we observed that a curtain, the operating bed, a server rack and a stand for an ultrasound device were infrequently detected as a person. To overcome these shortcomings, in future iterations the Microsoft Kinect V2 camera with the Microsoft algorithms [26] will be used. The results show that moving persons in the operating theater can be detected in very short time; however, it has to be taken into account that the data are adjusted using the delay (rounded 1 s) resulting in higher values for detection time.…”
Section: Discussionmentioning
confidence: 99%
“…We use all frames of actor 1 to construct the test set (≈ 62k poses). The H36M skeleton includes some spurious joints that we delete, which results in the same 20 joints present in the Kinect skeleton [20]. All frames are given in relative xyz coordinates centered at the hip node, unless otherwise stated.…”
Section: Methodsmentioning
confidence: 99%
“…Reasoning about human pose is a key ingredient in recent successful applications of computer vision systems [20]. Accurately capturing the variability of human pose is challenging because there is both a variation between different persons as well as a combinatorial number of possible poses a single person can assume.…”
Section: Introductionmentioning
confidence: 99%
“…Another reason which inconsistent positions are estimated because it's not tracked. Artificial vibration is result of the acquisition error from camera which produces unwanted vibrations on the extracted joint-position and makes length of bones change during the motion [8]. They apply object recognition approach that produce designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem.…”
Section: Analysis Of "Le Sas"mentioning
confidence: 99%
“…Then generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. They use consumer hardware which runs at 200 frames per second, this allow to show high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters [8]. Self-occlusion, (Top-right) Bone-length variation [7].…”
Section: Analysis Of "Le Sas"mentioning
confidence: 99%