Real time head pose estimation with random regression forests

Fanelli, Gabriele; Gall, Jüergen; Gool, Luc Van

doi:10.1109/cvpr.2011.5995458

Cited by 392 publications

(353 citation statements)

References 23 publications

Supporting

Mentioning

351

Contrasting

Unclassified

Order By: Relevance

“…A success of such methods in 3D full-body pose estimation is evident from recent results that use Microsoft Kinect sensor (Girshick et al, 2011;Sun et al, 2012); such discriminative methods have also proved effective for other problems, including image-based 3D pose (Bo and Sminchisescu, 2010;Kanaujia et al, 2007;Shakhnarovich et al, 2003;Sminchisescu et al, 2006;Urtasun and Darrell, 2008), head pose (Fanelli et al, 2011) and body shape (Chen et al, 2011;Sigal et al, 2007) estimation. The typical goal of discriminative regression methods is to learn a direct (and sometimes multi-modal) mapping, f : R dx → R dy , from features (e.g., computed from image or depth data) to pose (e.g., 3D position and orientation of the head, or full 3D pose of the body encoded by joint positions or joint angles).…”

Section: Introductionmentioning

confidence: 97%

Domain Adaptation for Structured Regression

2013

View full text Add to dashboard Cite

Discriminative regression models have proved effective for many vision applications (here we focus on 3D full-body and head pose estimation from image and depth data). However, dataset bias is common and is able to significantly degrade the performance of a trained model on target test sets. As we show, covariate shift, a form of unsupervised domain adaptation (USDA), can be used to address certain biases in this setting, but is unable to deal with more severe structural biases in the data. We propose an effective and efficient semi-supervised domain adaptation (SSDA) approach for addressing such more severe biases in the data. Proposed SSDA is a generalization of USDA, that is able to effectively leverage labeled data in the target domain when available. Our method amounts to projecting input features into a higher dimensional space (by construction well suited for domain adaptation) and estimating weights for the training samples based on the ratio of test and train marginals in that space. The resulting augmented weighted samples can then be used to learn a model of choice, alleviating the problems of bias in the data; as an example, we introduce SSDA Twin Gaussian Process regression (SSDA-TGP) model. With this model we also address the issue of data sharing, where we are able to leverage samples from certain activities (e.g., walking, jogging) to improve predictive performance on very different activities (e.g., boxing). In addition, we analyze the relationship between domain similarity and effectiveness of proposed USDA vs. SSDA methods. Moreover, we propose a computationally efficient alternative to TGP (Bo and Sminchisescu, 2010), and it's variants, called the direct TGP (dTGP). We show that our model outperforms a number of baselines, on two public datasets: HumanEva and ETH Face Pose Range Image Dataset. We can also achieve 8 to 15 times speedup in computation time, over the traditional formulation of TGP, using the proposed direct formulation, with little to no loss in performance.

show abstract

Section: Introductionmentioning

confidence: 97%

Domain Adaptation for Structured Regression

2013

View full text Add to dashboard Cite

show abstract

“…Papazov et al [25] also used a random forest-based framework, in a similar way to the methods in Refs. [22][23][24]. They replaced depth features by more elaborate triangular surface patch (TSP) features to ensure view-invariance.…”

Section: Head Pose Estimationmentioning

confidence: 99%

“…Fanelli et al [22][23][24] adopted a voting method to directly determine head pose. However, their feature selection method for depth images degenerates into using 2D features, i.e., the RGB information used in 2D images was replaced by xyz-coordinate values in depth images.…”

Section: Head Pose Estimationmentioning

confidence: 99%

Joint head pose and facial landmark regression from depth images

et al. 2017

View full text Add to dashboard Cite

This paper presents a joint head pose and facial landmark regression method with input from depth images for realtime application. Our main contributions are: firstly, a joint optimization method to estimate head pose and facial landmarks, i.e., the pose regression result provides supervised initialization for cascaded facial landmark regression, while the regression result for the facial landmarks can also help to further refine the head pose at each stage. Secondly, we classify the head pose space into 9 sub-spaces, and then use a cascaded random forest with a global shape constraint for training facial landmarks in each specific space. This classification-guided method can effectively handle the problem of large pose changes and occlusion. Lastly, we have built a 3D face database containing 73 subjects, each with 14 expressions in various head poses. Experiments on challenging databases show our method achieves state-of-the-art performance on both head pose estimation and facial landmark regression.

show abstract

“…Similar to existing regression forests in literature including (Fanelli et al 2011;Shotton et al 2011;Denil et al 2014), at a split node, we randomly select a relatively small set of s distinct features Φ := {φ i } s i=1 from the d-dimensional space as candidate features (i.e. entries of the feature vector).…”

Section: The Split Criteriamentioning

confidence: 99%

“…The information gains and split criteria, the usage of whole hand image patch rather than individual pixels, as well as the DOT features to be detailed later are also quite different. Meanwhile, various related regression forest models have been investigated recently: in Fanelli et al (2011), the head pose has 6 degree-of-freedom (DoF), which is divided into 2 parts: 3D translation and 3D orientation. In each leaf node, the distribution is approximated by a 3D Gaussian.…”

Section: Related Workmentioning

confidence: 99%

Estimate Hand Poses Efficiently from Single Depth Images

Nanjappa

Zhang

et al. 2015

Int J Comput Vis

View full text Add to dashboard Cite

This paper aims to tackle the practically very challenging problem of efficient and accurate hand pose estimation from single depth images. A dedicated two-step regression forest pipeline is proposed: given an input hand depth image, step one involves mainly estimation of 3D location and in-plane rotation of the hand using a pixelwise regression forest. This is utilized in step two which delivers final hand estimation by a similar regression forest model based on the entire hand image patch. Moreover, our estimation is guided by internally executing a 3D hand kinematic chain model. For an unseen test image, the kinematic model parameters are estimated by a proposed dynamically weighted scheme. As a combined effect of these proposed building blocks, our approach is able to deliver more precise estimation of hand poses. In practice, our approach works at 15.6 frame-per-second (FPS) on an average laptop when implemented in CPU, which is further sped-up to 67.2 FPS when running on GPU. In addition, we introduce and make publicly available a data-glove annotated depth image dataset covering various hand shapes and gestures, which enables us conducting quantitative analyses on real-world hand images. The effectiveness of our approach is verified empirically on both synthetic and the annotated real-world datasets for hand pose estimation, as well as related applications including part-based labeling and gesture classification. In addition to empirical studies, the consistency property of our approach is also theoretically analyzed.

show abstract

Real time head pose estimation with random regression forests

Cited by 392 publications

References 23 publications

Domain Adaptation for Structured Regression

Domain Adaptation for Structured Regression

Joint head pose and facial landmark regression from depth images

Estimate Hand Poses Efficiently from Single Depth Images

Contact Info

Product

Resources

About