A CNN Model for Head Pose Recognition using Wholes and Regions

Behera, Ardhendu; Gidney, Andrew G; Wharton, Zachary; Robinson, Daniel N.; Quinn, Keiron

doi:10.1109/fg.2019.8756536

Cited by 10 publications

(10 citation statements)

References 37 publications

(108 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A common observation is that the overall performance of the baselines and ROI-CNN [50] is low in VGGFace2 [54], MTFL [20], and AFLW [55] in comparison to MultiLab [50]. This is mainly due to the clutter in images.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition

Behera¹,

Wharton²,

Ghahremani³

et al. 2023

IEEE Trans. Affective Comput.

Self Cite

View full text Add to dashboard Cite

Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in local appearance. The results show that this is a brittle approach since it relies on the accurate body parts/objects detection. In this work, we argue that there exist local discriminative semantic regions, whose "informativeness" can be evaluated by the attention mechanism for inferring fine-grained gestures/actions. To this end, we propose a novel end-to-end Regional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN) to combine multiple contextual regions through attention mechanism, focusing on parts of the images that are most relevant to a given task. Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor. The model is extensively evaluated on ten datasets belonging to 3 different scenarios: 1) head pose recognition, 2) drivers state recognition, and 3) human action and facial expression recognition. The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.

show abstract

Section: Resultsmentioning

confidence: 99%

“…This work builds on the published conference output [50], focusing on coarse head pose recognition from image intensities using ROIs. The proposed RAN makes a substantial advance to it in two aspects: (i) by integrating a novel attention mechanism to explore salient regions in images while making recognition decisions.…”

Section: Previous Work By Authorsmentioning

confidence: 99%

Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition

Behera¹,

Wharton²,

Ghahremani³

et al. 2023

IEEE Trans. Affective Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The HPE methods have attracted increasing attention of researchers, especially when deep learning-based methods have become more and more prevalent in computer visionrelated tasks. Behera et al [26] learned multi-level features by exploring different image regions, which combine multiple local regions with the whole image for discrete pose classification. Limitation of pure classification based method is that they only predict the approximate range of head pose intervals, and they lack the ability for fine-grained estimation, which would inevitably obstruct the way to wider applications.…”

Section: B Methods Focusing On Algorithm Performancementioning

confidence: 99%

An End-to-End Task-Simplified and Anchor-Guided Deep Learning Framework for Image-Based Head Pose Estimation

Wang

Ullah

2020

IEEE Access

View full text Add to dashboard Cite

Image-based Head Pose Estimation (HPE) from an arbitrary view is still challenging due to the complex imaging conditions as well as the intrinsic and extrinsic property of the faces. Different from existing HPE methods combining additional cues or tasks, this paper solves the HPE problem by relieving problem complexity. Our method integrates the deep Task-Simplification oriented Image Regularization (TSIR) module with the Anchor-Guided Pose Estimation (AGPE) module, and formulate the HPE problem into a unified end-to-end learning framework. In this paper, we define anchors as images that strictly obey the ''gravity rule in camera'', which follows the assumption that camera coordinate of the vertical axis should always be consistent with that of the local head coordinate. We formulate image pair as the regularized image produced by TSIR along with its anchor counterpart, both of which are fed into the AGPE module for estimating fine-grained head poses. This paper also proposes an Anchor-Guided Pairwise Loss (AGPL), which describes the interdependent relevance of poses between each pair of images. The proposed method is evaluated and validated with sufficient experiments which show its effectiveness. Comprehensive experiments show that our approach outperforms the state-of-the-art image-based methods on both indoor and outdoor datasets.INDEX TERMS Head pose estimation, task-simplification oriented image regularization, anchor-guided pose estimation, anchor-guided pairwise loss, deep learning framework.

show abstract

“…There are a number of datasets produced so far for head pose estimation [54,55]. Often facial landmarks are used to generate the ground-truth head poses by fitting a mean 3D face with the POSIT algorithm [26] since it is difficult to precisely measure (or manually annotate) them.…”

Section: Datasets and Evaluation Strategiesmentioning

confidence: 99%

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

Behera

Wharton

Hewage

et al. 2021

Computer Vision – ACCV 2020

Self Cite

View full text Add to dashboard Cite

Head pose is a vital indicator of human attention and behavior. Therefore, automatic estimation of head pose from images is key to many applications. In this paper, we propose a novel approach for head pose estimation from a single RGB image. Many existing approaches often predict head poses by localizing facial landmarks and then solve 2D to 3D correspondence problem with a mean head model. Such approaches rely entirely on the landmark detection accuracy, an ad-hoc alignment step, and the extraneous head model. To address this drawback, we present an end-to-end deep network, which explores rotation axis (yaw, pitch and roll) focused innovative attention mechanism to capture the subtle changes in images. The mechanism uses attentional spatial pooling from a self-attention layer and learns the importance over fine-grained to coarse spatial structures and combine them to capture rich semantic information concerning a given rotation axis. The evaluation of our approach using three benchmark datasets is very competitive to state-of-the-arts, including with and without landmark-based methods.

show abstract

A CNN Model for Head Pose Recognition using Wholes and Regions

Cited by 10 publications

References 37 publications

Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition

Regional Attention Network (RAN) for Head Pose and Fine-Grained Gesture Recognition

An End-to-End Task-Simplified and Anchor-Guided Deep Learning Framework for Image-Based Head Pose Estimation

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

Contact Info

Product

Resources

About