Learning Robust Objective Functions with Application to Face Model Fitting

Wimmer, Matthias; Pietzsch, Sylvia; Stulp, Freek; Radig, Bernd

doi:10.1007/978-3-540-74936-3_49

Cited by 3 publications

(6 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, we apply this ideal objective function to annotated training images and obtain ideal training data for learning a local objective function f n,l for the model point n and its characteristic direction l. The key idea behind our approach is that since the training data is generated by an ideal objective function, the learned function will also be approximately ideal. This has already been shown in [14]. Figure 1 (right) illustrates the proposed five-step procedure.…”

Section: Learning Objective Functions From Image Annotationssupporting

confidence: 57%

“…Our previous work identifies the objective function as an essential component fitting models to single images [14]. This function evaluates how well a particular model fits to an image.…”

Section: Introductionmentioning

confidence: 99%

“…As a solution to this challenge, we propose to conduct a five-step methodology that learns robust local objective functions from annotated example images. We investigated this approach for 2D models so far [14]. This paper extends our methodology in order to generate objective functions that are capable of handling 3D models as well.…”

Section: Introductionmentioning

confidence: 99%

“…

…”

mentioning

confidence: 99%

See 3 more Smart Citations

Initial Pose Estimation for 3D Model Tracking Using Learned Objective Functions

Wimmer

Radig

Computer Vision – ACCV 2007

Self Cite

View full text Add to dashboard Cite

Tracking 3D models in image sequences essentially requires determining their initial position and orientation. Our previous work [14] identifies the objective function as a crucial component for fitting 2D models to images. We state preferable properties of these functions and we propose to learn such a function from annotated example images. This paper extends this approach by making it appropriate to also fit 3D models to images. The correctly fitted model represents the initial pose for model tracking. However, this extension induces nontrivial challenges such as out-of-plane rotations and self occlusion, which cause large variation to the model's surface visible in the image. We solve this issue by connecting the input features of the objective function directly to the model. Furthermore, sequentially executing objective functions specifically learned for different displacements from the correct positions yields highly accurate objective values.

show abstract

Section: Learning Objective Functions From Image Annotationssupporting

confidence: 57%

“…Our previous work identifies the objective function as an essential component fitting models to single images [14]. This function evaluates how well a particular model fits to an image.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…

…”

mentioning

confidence: 99%

See 2 more Smart Citations

Initial Pose Estimation for 3D Model Tracking Using Learned Objective Functions

Wimmer

Radig

Computer Vision – ACCV 2007

Self Cite

View full text Add to dashboard Cite

show abstract

“…This heuristic approach relies on the designer's intuition about a good measure of fitness. Our earlier publications (Wimmer et al, 2007b;Wimmer et al, 2007a) show that this methodology is erroneous and tedious.…”

Section: Video Low-level Descriptorsmentioning

confidence: 99%

Low-Level Fusion of Audio and Video Feature for Multi-Modal Emotion Recognition

Wimmer

Schuller

Arsic³

et al. 2008

Proceedings of the Third International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

Bimodal emotion recognition through audiovisual feature fusion has been shown superior over each individual modality in the past. Still, synchronization of the two streams is a challenge, as many vision approaches work on a frame basis opposing audio turn-or chunk-basis. Therefore, late fusion schemes such as simple logic or voting strategies are commonly used for the overall estimation of underlying affect. However, early fusion is known to be more effective in many other multimodal recognition tasks. We therefore suggest a combined analysis by descriptive statistics of audio and video Low-Level-Descriptors for subsequent static SVM Classification. This strategy also allows for a combined feature-space optimization which will be discussed herein. The high effectiveness of this approach is shown on a database of 11.5h containing six emotional situations in an airplane scenario.

show abstract