Facial Expression Recognition via Joint Deep Learning of RGB-Depth Map Latent Representations

Oyedotun, Oyebade K.; Demisse, Girum G.; Shabayek, Abd El Rahman; Aouada, Djamila; Ottersten, Björn

doi:10.1109/iccvw.2017.374

Cited by 41 publications

(32 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…BU-3DFE dataset contains both 3D and 2D data modalities for facial expressions of 100 different subjects. The data processing in [14] is followed for preparing the training data. DepthMap-ResNet50 [14] 61.11 DepthMap-VGG19 [14] 28.06 DepthMap-scratch [14] 84.72 RGB-ResNet50 [14] 82.92 RGB-VGG19 [14] 81.…”

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

“…DepthMap-ResNet50 [14] 61.11 DepthMap-VGG19 [14] 28.06 DepthMap-scratch [14] 84.72 RGB-ResNet50 [14] 82.92 RGB-VGG19 [14] 81. [16] 82.30 Distance+slopes+SVM [18] 87.10 2D+3D features fusion+SVM [19] 86.32 Geometric scattering representation+SVM [20] 84.80 Geometric+photometric attributes+VGG19 [21] 84.87 NF:RGB-ResNet50+DepthMap-scratch [14] 87.08 NF: RGB-VGG19+DepthMap-scratch [14] 89.31 Ours: RGB-ResNet50+DepthMap-scratch 89.86 Ours: RGB-VGG19+DepthMap-scratch 90.69 Table 2. Results comparison on BU-3DFE dataset Also, a similar experimental setting is used for extracting latent expresentations from both data modalities; that is, using pre-trained models (ResNet-50 and VGG19) on ImageNet dataset for RGB data and training a DNN from scratch on depth data.…”

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

“…Results comparison on BU-3DFE dataset Also, a similar experimental setting is used for extracting latent expresentations from both data modalities; that is, using pre-trained models (ResNet-50 and VGG19) on ImageNet dataset for RGB data and training a DNN from scratch on depth data. Furthermore, 10-fold cross-validation (CV) is employed for evaluating the models as in [14,15,16,17]. The results are given in Table 1, along with models trained on RGB only, depth map only and via naive fusion as in [14].…”

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

“…Furthermore, 10-fold cross-validation (CV) is employed for evaluating the models as in [14,15,16,17]. The results are given in Table 1, along with models trained on RGB only, depth map only and via naive fusion as in [14]. It will be seen that the proposed fusion approach gives improved results over naive fusion.…”

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

See 3 more Smart Citations

Learning to Fuse Latent Representations for Multimodal Data

Oyedotun

Aouada

Ottersten

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Multimodal learning leverages data from different modalities to improve the performance of a trained model. Typically, latent representations extracted from multimodal data are provided via direct feature fusion for end-to-end training of a deep neural network towards a specific task. However, the informativeness of the different data modalities can easily vary across a collected dataset. As such, naively or directly fusing the latent representations obtained for one modality and the other, as is commonly done in state-of-the-art works, may burden the model in finding concise representations that are indeed useful for learning. In this paper, we propose to instead learn the fusion of latent representations for multimodal data by using a modality gating mechanism that allows the dynamic weighting of extracted latent representations based on their informativness. Extensive experiments using the BU-3DFE dataset for facial expression recognition and the Washington object classification multimodal RGB-D dataset show that learning the fusion of the latent representations for different data modalities leads to improved model generalization than the conventional naive fusion method.

show abstract

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

Section: Bu-3dfe Facial Expression Rgb-d Datasetmentioning

confidence: 99%

See 2 more Smart Citations

Learning to Fuse Latent Representations for Multimodal Data

Oyedotun

Aouada

Ottersten

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The two pieces of information combined provide more dimensions to model and process, e.g. [9,26,16]. This richer information is desired in several scenarios.…”

Section: Introductionmentioning

confidence: 99%

3DBodyTex: Textured 3D Body Dataset

Saint

Ahmed

Shabayek

et al. 2018

2018 International Conference on 3D Vision (3DV)

Self Cite

View full text Add to dashboard Cite

In this paper, a dataset, named 3DBodyTex, of static 3D body scans with high-quality texture information is presented along with a fully automatic method for body model fitting to a 3D scan. 3D shape modelling is a fundamental area of computer vision that has a wide range of applications in the industry. It is becoming even more important as 3D sensing technologies are entering consumer devices such as smartphones. As the main output of these sensors is the 3D shape, many methods rely on this information alone. The 3D shape information is, however, very high dimensional and leads to models that must handle many degrees of freedom from limited information. Coupling texture and 3D shape alleviates this burden, as the texture of 3D objects is complementary to their shape. Unfortunately, high-quality texture content is lacking from commonly available datasets, and in particular in datasets of 3D body scans. The proposed 3DBodyTex dataset aims to fill this gap with hundreds of high-quality 3D body scans with high-resolution texture. Moreover, a novel fully automatic pipeline to fit a body model to a 3D scan is proposed. It includes a robust 3D landmark estimator that takes advantage of the high-resolution texture of 3DBodyTex. The pipeline is applied to the scans, and the results are reported and discussed, showcasing the diversity of the features in the dataset.

show abstract

Facial Expression Recognition with an Attention Network Using a Single Depth Image

Jian-min

Xie

et al. 2020

Communications in Computer and Information Science

View full text Add to dashboard Cite

Facial Expression Recognition via Joint Deep Learning of RGB-Depth Map Latent Representations

Cited by 41 publications

References 27 publications

Learning to Fuse Latent Representations for Multimodal Data

Learning to Fuse Latent Representations for Multimodal Data

3DBodyTex: Textured 3D Body Dataset

Facial Expression Recognition with an Attention Network Using a Single Depth Image

Contact Info

Product

Resources

About