2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting

Matthews, Iain; Xiao, Jing; Baker, Simon

doi:10.1007/s11263-007-0043-2

Cited by 83 publications

(52 citation statements)

References 22 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Head pose was measured from the 2D videos using a cylindrical head tracker of [19]. This tracker is person-independent, robust, and has concurrent validity with person-specific 2D+3D AAM [20] and with magnetic motion capture device [19]. The head pose (yaw, roll, and pitch) were measured with respect to the frontal pose.…”

Section: B Head Posementioning

confidence: 99%

A high-resolution spontaneous 3D dynamic facial expression database

Zhang

Liu

Cohn

et al. 2013

2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)

198

138

View full text Add to dashboard Cite

Abstract-Facial expression is central to human experience. Its efficient and valid measurement is a challenge that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka "spontaneous") facial expressions differ along several dimensions including complexity and timing, well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are needed. We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both personspecific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action.

show abstract

Section: B Head Posementioning

confidence: 99%

A high-resolution spontaneous 3D dynamic facial expression database

Zhang

Liu

Cohn

et al. 2013

2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)

198

138

View full text Add to dashboard Cite

show abstract

“…The goal of NRSFM is to recover 3-D shape models from 2-D tracked landmarks, while SPA builds unbiased 2-D models from 3-D data. The learned 2-D model has the same representational power of a 3-D model but leads to faster fitting algorithms [15]. SPA uniformly samples the space of possible 3-D rigid transformations, and it is extremely efficient in space and time.…”

Section: Object (3)mentioning

confidence: 99%

Subspace Procrustes Analysis

Perez-Sala

Torre

Igual

et al. 2015

Computer Vision - ECCV 2014 Workshops

View full text Add to dashboard Cite

Abstract. Procrustes Analysis (PA) has been a popular technique to align and build 2-D statistical models of shapes. Given a set of 2-D shapes PA is applied to remove rigid transformations. Then, a non-rigid 2-D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling 2-D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the 2-D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several instances of a 3-D object, SPA computes the mean and a 2-D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased 2-D models by uniformly sampling different views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more efficient in space and time. Experiments using SPA to learn 2-D models of bodies from motion capture data illustrate the benefits of our approach.

show abstract

“…Standard active appearance models (2D AAMs) [29], [30] are not directly comparable to G-flow because they are purely 2D models, which track in the 2D image plane without regard to the 3D configuration of the vertices. However, there is a 3D extension of the active appearance model, the so-called combined 2D+3D active appearance model (2D+3D AAM) [8], [28], which is a real-time, online 3D tracking system. We classify it as a template-based model because its appearance model does not change over time.…”

Section: Relation To Other Algorithms For Tracking 3dmentioning

confidence: 99%

“…Perhaps this system's biggest weakness is that it cannot handle self- The table compares the features of G-flow with those of other approaches. The approaches compared are (left to right): Constrained optic flow [9], [10], [11]; 2D+3D active appearance models [8], [28]; the 3D generative template model of [7]; and our G-flow model.…”

Section: Relation To Other Algorithms For Tracking 3dmentioning

confidence: 99%

“…More sophisticated models are possible in which object texture is produced by a linear combination of texture maps in the same way that geometric deformations are modeled as a linear combination of key shapes. Subspace texture models have previously been used effectively for 3D face tracking [8], [28], for 2D tracking with Rao-Blackwellized particle filters [34], and for 2D tracking using a dynamic texture map [35]. Shape and texture parameters could also be permitted to be correlated.…”

Section: Sophisticated Texture Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes

Marks

Hershey

Movellan

2010

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

We present a generative model and inference algorithm for 3D nonrigid object tracking. The model, which we call G-flow, enables the joint inference of 3D position, orientation, and nonrigid deformations, as well as object texture and background texture. Optimal inference under G-flow reduces to a conditionally Gaussian stochastic filtering problem. The optimal solution to this problem reveals a new space of computer vision algorithms, of which classic approaches such as optic flow and template matching are special cases that are optimal only under special circumstances. We evaluate G-flow on the problem of tracking facial expressions and head motion in 3D from single-camera video. Previously, the lack of realistic video data with ground truth nonrigid position information has hampered the rigorous evaluation of nonrigid tracking. We introduce a practical method of obtaining such ground truth data and present a new face video data set that was created using this technique. Results on this data set show that G-flow is much more robust and accurate than current deterministic optic-flow-based approaches. IEE Transactions on Pattern Analysis and Machine IntelligenceThis work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Abstract-We present a generative model and inference algorithm for 3D nonrigid object tracking. The model, which we call G-flow, enables the joint inference of 3D position, orientation, and nonrigid deformations, as well as object texture and background texture.Optimal inference under G-flow reduces to a conditionally Gaussian stochastic filtering problem. The optimal solution to this problem reveals a new space of computer vision algorithms, of which classic approaches such as optic flow and template matching are special cases that are optimal only under special circumstances. We evaluate G-flow on the problem of tracking facial expressions and head motion in 3D from single-camera video. Previously, the lack of realistic video data with ground truth nonrigid position information has hampered the rigorous evaluation of nonrigid tracking. We introduce a practical method of obtaining such ground truth data and present a new face video data set that was created using this technique. Results on this data set show that G-flow is much more robust and accurate than current deterministic optic-flow-based approaches.

show abstract

2D vs. 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting

Cited by 83 publications

References 22 publications

A high-resolution spontaneous 3D dynamic facial expression database

A high-resolution spontaneous 3D dynamic facial expression database

Subspace Procrustes Analysis

Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes

Contact Info

Product

Resources

About