Frankenstein: Learning Deep Face Representations Using Small Data

Hu, Guosheng; Peng, Xiaojiang; Yang, Yongxin; Hospedales, Timothy M.; Verbeek, Jakob

doi:10.1109/tip.2017.2756450

Cited by 115 publications

(58 citation statements)

References 69 publications

(135 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Aggressive data augmentation is, therefore, a must with CNN-based models. When trying to learn a CNN-based model from scratch, researchers typically augment the available training data by producing data variations with, e.g., geometric transformations, color modifications, addition of noise, and more recently also by synthesizing samples of artificial identities, as, for example, described in [13].…”

Section: A Learning Strategiesmentioning

confidence: 99%

“…In this paper we address the problem of training CNNs with limited training data and strive to develop an effective CNN-based model for ear recognition. Existing approaches to CNN training with small amounts of training data typically include i) metric-learning approaches, where training is performed with image pairs (or even triplets) instead of single images [8], [9], ii) data augmentation techniques that in addition to geometric and color perturbations of the existing training data also include the generation of synthetic data samples [10], [11], [12], [13], and iii) using existing CNNs (trained for related recognition problems) as so-called "black-box" feature extractors, on top of which additional classifiers are trained and used for recognition [14]. Here, we build on these approaches and successfully develop a CNN model for ear recognition by exploring different strategies to network training, i.e.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Training Convolutional Neural Networks with Limited Training Data for Ear Recognition in the Wild

Emeršič

Štepec

Štruc

et al. 2017

2017 12th IEEE International Conference on Automatic Face &Amp; Gesture Recognition (FG 2017)

View full text Add to dashboard Cite

Identity recognition from ear images is an active field of research within the biometric community. The ability to capture ear images from a distance and in a covert manner makes ear recognition technology an appealing choice for surveillance and security applications as well as related application domains. In contrast to other biometric modalities, where large datasets captured in uncontrolled settings are readily available, datasets of ear images are still limited in size and mostly of laboratory-like quality. As a consequence, ear recognition technology has not benefited yet from advances in deep learning and convolutional neural networks (CNNs) and is still lacking behind other modalities that experienced significant performance gains owing to deep recognition technology. In this paper we address this problem and aim at building a CNNbased ear recognition model. We explore different strategies towards model training with limited amounts of training data and show that by selecting an appropriate model architecture, using aggressive data augmentation and selective learning on existing (pre-trained) models, we are able to learn an effective CNN-based model using a little more than 1300 training images. The result of our work is the first CNN-based approach to ear recognition that is also made publicly available to the research community. With our model we are able to improve on the rank one recognition rate of the previous state-of-the-art by more than 25% on a challenging dataset of ear images captured from the web (a.k.a. in the wild).

show abstract

Section: A Learning Strategiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Training Convolutional Neural Networks with Limited Training Data for Ear Recognition in the Wild

Emeršič

Štepec

Štruc

et al. 2017

2017 12th IEEE International Conference on Automatic Face &Amp; Gesture Recognition (FG 2017)

View full text Add to dashboard Cite

show abstract

“…Verification accuracy can be affected by the type of bounding box used. In addition, most recent face recognition and verification methods [35,31,33,5,10,34] use some kind of 2D or 3D alignment procedure [41,15,28,8]. All these variables can lead to changes in performance of deep networks.…”

Section: Introductionmentioning

confidence: 99%

The Do’s and Don’ts for CNN-Based Face Verification

Bansal

Castillo

Ranjan

et al. 2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

View full text Add to dashboard Cite

While the research community appears to have developed a consensus on the methods of acquiring annotated data, design and training of CNNs, many questions still remain to be answered. In this paper, we explore the following questions that are critical to face recognition research: (i) Can we train on still images and expect the systems to work on videos? (ii) Are deeper datasets better than wider datasets? (iii) Does adding label noise lead to improvement in performance of deep networks? (iv) Is alignment needed for face recognition? We address these questions by training CNNs using CASIA-WebFace, UMD-Faces, and a new video dataset and testing on YouTube-Faces, IJB-A and a disjoint portion of UMDFaces datasets. Our new data set, which will be made publicly available, has 22,075 videos and 3,735,476 human annotated frames extracted from them.

show abstract

“…Hypothesis: Removing the confounding factor stress would aid in creating models that are more generalizable across datasets. Previous research has shown that laboratory collected datasets are too small and often fail to capture the complete distribution of the domain [18,28] present in the real world. These datasets are often plagued with unintentional correlational factors [27,28].…”

Section: Questionmentioning

confidence: 99%

Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

Jaiswal

Aldeneh

Provost

2019

2019 International Conference on Multimodal Interaction

View full text Add to dashboard Cite

Various psychological factors affect how individuals express emotions. Yet, when we collect data intended for use in building emotion recognition systems, we often try to do so by creating paradigms that are designed just with a focus on eliciting emotional behavior. Algorithms trained with these types of data are unlikely to function outside of controlled environments because our emotions naturally change as a function of these other factors. In this work, we study how the multimodal expressions of emotion change when an individual is under varying levels of stress. We hypothesize that stress produces modulations that can hide the true underlying emotions of individuals and that we can make emotion recognition algorithms more generalizable by controlling for variations in stress. To this end, we use adversarial networks to decorrelate stress modulations from emotion representations. We study how stress alters acoustic and lexical emotional predictions, paying special attention to how modulations due to stress affect the transferability of learned emotion recognition models across domains. Our results show that stress is indeed encoded in trained emotion classifiers and that this encoding varies across levels of emotions and across the lexical and acoustic modalities. Our results also show that emotion recognition models that control for stress during training have better generalizability when applied to new domains, compared to models that do not control for stress during training. We conclude that is is necessary to consider the effect of extraneous psychological factors when building and testing emotion recognition models.

show abstract

Frankenstein: Learning Deep Face Representations Using Small Data

Cited by 115 publications

References 69 publications

Training Convolutional Neural Networks with Limited Training Data for Ear Recognition in the Wild

Training Convolutional Neural Networks with Limited Training Data for Ear Recognition in the Wild

The Do’s and Don’ts for CNN-Based Face Verification

Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

Contact Info

Product

Resources

About