2020
DOI: 10.1109/tcsvt.2019.2952779
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation

Abstract: Achieving robust multi-person 2D body landmark localization and pose estimation is essential for human behavior and interaction understanding as encountered for instance in HRI settings. Accurate methods have been proposed recently, but they usually rely on rather deep Convolutional Neural Network (CNN) architecture, thus requiring large computational and training resources. In this paper, we investigate different architectures and methodologies to address these issues and achieve fast and accurate multi-perso… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 35 publications
(17 citation statements)
references
References 33 publications
(63 reference statements)
0
17
0
Order By: Relevance
“…From the perspective of the development process of image text description, it can be divided into three stages: template-based image text description method; retrieval-based image text description method; deep learning-based image text description method [ 1 ]. Before the deep learning method was proposed, most of the image description methods used template-based and retrieval-based methods.…”
Section: Related Workmentioning
confidence: 99%
“…From the perspective of the development process of image text description, it can be divided into three stages: template-based image text description method; retrieval-based image text description method; deep learning-based image text description method [ 1 ]. Before the deep learning method was proposed, most of the image description methods used template-based and retrieval-based methods.…”
Section: Related Workmentioning
confidence: 99%
“…Three architectures are considered. The two firsts are the efficient pose machines based on residual modules (RPM) and the one based on MobileNets (MPM) introduced in [14]. These are lightweight CNNs that refine predictions with a series of prediction stages and are designed for efficient 2D pose estimation with real-time performance, see Fig.…”
Section: A Cnn-based 2d Pose Estimationmentioning
confidence: 99%
“…2D CNN architectures and training. We keep the performance-efficiency trade-off reported in [14] and experiment with RPM with 2 stages and MPM with 4 stages. We configure the Hourglass architecture (HG) to 2 stages it was shown that performance saturates at this point [16].…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Chen et al [7] created an automated toolchain to synthesize RGB images from 3D poses. Regarding depth features, Martinez et al [21] combined multi-person synthetic depth data with real sensor backgrounds.…”
Section: Body Pose Datasetsmentioning
confidence: 99%