2020
DOI: 10.1109/tpami.2018.2885472
|View full text |Cite
|
Sign up to set email alerts
|

Face-from-Depth for Head Pose Estimation on Depth Images

Abstract: Depth cameras allow to set up reliable solutions for people monitoring and behavior understanding, especially when unstable or poor illumination conditions make unusable common RGB sensors. Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only. A head detection and localization module is also included, in order to develop a complete end-to-end system. The core element of the framework is a Convolutional Neural Network, called POSEidon + , that re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
47
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 75 publications
(48 citation statements)
references
References 85 publications
1
47
0
Order By: Relevance
“…We relaxed the structure of the classical hourglass architecture performing less upsampling and downsampling operations in order to preserve the structural coherence between input and output. We found that using the half of feature maps described in [12] at each layer in both Generator and Discriminator networks sped up the training without a significant reduction of qualitative performance.…”
Section: B Architecturementioning
confidence: 91%
See 2 more Smart Citations
“…We relaxed the structure of the classical hourglass architecture performing less upsampling and downsampling operations in order to preserve the structural coherence between input and output. We found that using the half of feature maps described in [12] at each layer in both Generator and Discriminator networks sped up the training without a significant reduction of qualitative performance.…”
Section: B Architecturementioning
confidence: 91%
“…A previous work starting from the same depth images has designed an Autoencoder to create gray-level faces from depth, with the final goal of head estimation [6]. An extension of this work is performed in [12] where a GAN is trained for the same final goal. In this paper, we compare our architecture with [9], using a similar dataset, other datasets, and with some probe perceptual tasks.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The first input is represented by the raw depth map, while the second is a Motion Image, obtained running the Optical Flow algorithm (Farneback implementation) on a sequence of depth images. The third input is generated by a network called Face-from-Depth [6], that is able to reconstructs gray-level face images starting from the related depth images. All these three inputs are then processed by a regressive Convolutional Neural Network (CNN) [7] that finally outputs the value of the yaw, pitch and roll 3D angles expressed as continuous values.…”
Section: Head Pose Estimationmentioning
confidence: 99%
“…We propose a deep learning-based framework based on 2 sequential modules. The first one is an autoencoder network [6], i.e. an encoder-decoder architecture whose goal is to reconstruct the input frame, while the second one is a binary classifier network [4], predicting if the input frame contains or not an anomaly (i.e.…”
Section: Proposed Frameworkmentioning
confidence: 99%