Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation

Martínez-González, Ángel; Villamizar, Michael; Canévet, Olivier; Odobez, Jean-Marc

doi:10.1109/tcsvt.2019.2952779

Cited by 35 publications

(17 citation statements)

References 33 publications

(63 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From the perspective of the development process of image text description, it can be divided into three stages: template-based image text description method; retrieval-based image text description method; deep learning-based image text description method [ 1 ]. Before the deep learning method was proposed, most of the image description methods used template-based and retrieval-based methods.…”

Section: Related Workmentioning

confidence: 99%

Research on Multicamera Photography Image Art in BERT Motion Based on Deep Learning Mode

Zhao

Song

Tang

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

In order to improve the artistic expression effect of photographic images, this article combines the deep learning model to conduct multicamera photographic image art research in BERT motion. Moreover, this article analyzes the external parameter errors caused in the calibration process and uses the checkerboard in the common field of view to calibrate the spatial coordinates of the corners of the board in multiple camera coordinate systems. In addition, this article aims to match the spatial coordinates of the corresponding points to each other and solve the rotation and translation matrix in the transformation process. Finally, this article uses the LM algorithm to optimize the calibration parameters of the camera and combines the deep learning algorithm to perform image processing. The experimental research results show that the research method of multicamera photography image art in BERT motion based on the deep learning mode proposed in this article can effectively improve the expression effect of image art.

show abstract

Section: Related Workmentioning

confidence: 99%

Research on Multicamera Photography Image Art in BERT Motion Based on Deep Learning Mode

Zhao

Song

Tang

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…Three architectures are considered. The two firsts are the efficient pose machines based on residual modules (RPM) and the one based on MobileNets (MPM) introduced in [14]. These are lightweight CNNs that refine predictions with a series of prediction stages and are designed for efficient 2D pose estimation with real-time performance, see Fig.…”

Section: A Cnn-based 2d Pose Estimationmentioning

confidence: 99%

“…2D CNN architectures and training. We keep the performance-efficiency trade-off reported in [14] and experiment with RPM with 2 stages and MPM with 4 stages. We configure the Hourglass architecture (HG) to 2 stages it was shown that performance saturates at this point [16].…”

Section: Implementation Detailsmentioning

confidence: 99%

Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation

Martínez-González

Villamizar

Canévet

et al. 2020

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Self Cite

View full text Add to dashboard Cite

We propose to leverage recent advances in reliable 2D pose estimation with Convolutional Neural Networks (CNN) to estimate the 3D pose of people from depth images in multiperson Human-Robot Interaction (HRI) scenarios. Our method is based on the observation that using the depth information to obtain 3D lifted points from 2D body landmark detections provides a rough estimate of the true 3D human pose, thus requiring only a refinement step. In that line our contributions are threefold. (i) we propose to perform 3D pose estimation from depth images by decoupling 2D pose estimation and 3D pose refinement; (ii) we propose a deep-learning approach that regresses the residual pose between the lifted 3D pose and the true 3D pose; (iii) we show that despite its simplicity, our approach achieves very competitive results both in accuracy and speed on two public datasets and is therefore appealing for multi-person HRI compared to recent state-of-the-art methods.

show abstract

“…Chen et al [7] created an automated toolchain to synthesize RGB images from 3D poses. Regarding depth features, Martinez et al [21] combined multi-person synthetic depth data with real sensor backgrounds.…”

Section: Body Pose Datasetsmentioning

confidence: 99%

A system for the generation of in-car human body pose datasets

Borges

Queirós

Oliveira

et al. 2020

Machine Vision and Applications

View full text Add to dashboard Cite

With the advent of autonomous vehicles, detection of the occupants' posture is crucial to tackle the needs of infotainment interaction or passive safety systems. Generative approaches have been recently proposed for human body pose in-car detection, but this type of approaches requires a large training dataset for a feasible accuracy. This requirement poses a difficulty, given the substantial time required to annotate such large amount of data. In the in-car scenario, this requirement risk increases even further, since a robust human body pose ground-truth system capable of working in it is needed but inexistent. Currently, the gold standard for human body pose capture is based on optical systems, requiring up to 39 visible markers for a plug-in gait model, which in this case are not feasible given the occlusions inside the car. Other solutions, such as inertial suits, also have limitations linked to magnetic sensitivity and global positioning drift. In this paper, a system for the generation of images for human body pose detection in an in-car environment is proposed. To this end, we propose to smartly combine inertial and optical systems to suppress their individual limitations: By combining the global positioning of 3 visible head markers provided by the optical system with the inertial suit's relative human body pose, we obtain an occlusion-ready, drift-free full-body global positioning system. This system is then spatially and temporally calibrated with a time-of-flight sensor, automatically obtaining in-car image data with (multi-person) pose annotations. Besides quantifying the inertial suit inherent sensitivity and accuracy, the feasibility of the overall system for human body pose capture in the in-car scenario was demonstrated. Our results quantify the errors associated with the inertial suit, pinpoint some sources of the system's uncertainty and propose how to minimize some of them. Finally, we demonstrate the feasibility of using system generated data (which was made publicly available), independently or mixed with two publicly available generic datasets (not in-car), to train 2 machine learning algorithms, demonstrating the improvement in their algorithmic accuracy for the in-car scenario.

show abstract

Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation

Cited by 35 publications

References 33 publications

Research on Multicamera Photography Image Art in BERT Motion Based on Deep Learning Mode

Research on Multicamera Photography Image Art in BERT Motion Based on Deep Learning Mode

Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation

A system for the generation of in-car human body pose datasets

Contact Info

Product

Resources

About