Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics

Varkarakis, Viktor; Bazrafkan, Shabab; Costache, Gabriel; Corcoran, Peter

doi:10.1109/access.2020.3016097

Cited by 26 publications

(10 citation statements)

References 25 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This indicates there is a higher density of lookalikes between Sy and Se than in a real population, which is here modeled by IJB-C. This is similar to the observation made in [32] using StyleGAN. In the article, they notice that this is caused by the presence of children in FFHQ, and thus also in StyleGAN's output distribution, while SOTA face recognition networks are typically not trained on faces of children and thus perform poorly on this population.…”

Section: Are Stylegan2 Identities New ?supporting

confidence: 82%

“…To assess the requirement of privacy, we need to verify that generated identities do not simply reproduce existing identities from the FFHQ dataset. We evaluate this by reproducing on our synthetic dataset an experiment originally proposed in [32] on the first version of StyleGAN. It consists in comparing the identity similarity between a synthetic dataset (Sy) and a seed dataset (Se), which is the dataset used to train the face generator.…”

Section: Are Stylegan2 Identities New ?mentioning

confidence: 99%

See 1 more Smart Citation

On the use of automatically generated synthetic image datasets for benchmarking face recognition

Colbois,

Pereira,

Marcel

2021

Preprint

View full text Add to dashboard Cite

The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems. The work presented in this paper provides a study on benchmarking FR systems using a synthetic dataset. First, we introduce the proposed methodology to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation. Then, we confirm that (i) the generated synthetic identities are not data subjects from the GAN's training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.

show abstract

Section: Are Stylegan2 Identities New ?supporting

confidence: 82%

Section: Are Stylegan2 Identities New ?mentioning

confidence: 99%

On the use of automatically generated synthetic image datasets for benchmarking face recognition

Colbois,

Pereira,

Marcel

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The closer an ROC curve is to unity, the better the performance of the FR model on the selected samples. More information regarding the ROC and its interpretation and use can be found in [40].…”

Section: Using Roc Curves As a Metricmentioning

confidence: 99%

“…The effect of these augmentations on the performance of a SoA FR method is quantified using Receiver Operating Characteristic curve (ROC) techniques. A similar approach was used recently to validate synthetic facial identities [40]. Note that a re-lighting augmentation approach was adopted as existing public datasets do not provide sufficient lighting variability.…”

Section: Introductionmentioning

confidence: 99%

Towards End-to-End Neural Face Authentication in the Wild -- Quantifying and Compensating for Directional Lighting Effects

Varkarakis,

Yao,

Corcoran

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The recent availability of low-power neural accelerator hardware, combined with improvements in end-to-end neural facial recognition algorithms provides enabling technology for on-device facial authentication. The present research work examines the effects of directional lighting on a State-of-Art (SoA) neural face recognizer. A synthetic re-lighting technique is used to augment data samples due to the lack of public data-sets with sufficient directional lighting variations. Top lighting and its variants (top-left, topright) are found to have minimal effect on accuracy, while bottom-left or bottom-right directional lighting have the most pronounced effects. Following the fine-tuning of network weights, the face recognition model is shown to achieve close to the original Receiver Operating Characteristic curve (ROC) performance across all lighting conditions, and demonstrates an ability to generalize beyond the lighting augmentations used in the fine-tuning dataset. This work shows that a SoA neural face recognition models can be tuned to compensate for directional lighting effects, removing the need for a pre-processing step prior to applying facial recognition.

show abstract

“…The generator learns to distribute data from a real training dataset and the discriminator is trained to judge whether a sample is a real or a generated sample. The ultimate goal is to train the generator to produce high-quality synthetic images that can fool the discriminator [13,14,19]. At present, the StyleGAN is a representative of the most advanced GAN techniques and can produce sufficiently high-quality and photo-realistic samples with relatively low computation cost [13,20,21].…”

Section: Introductionmentioning

confidence: 99%

Automated Sewer Defects Detection Using Style-Based Generative Adversarial Networks and Fine-Tuned Well-Known CNN Classifier

et al. 2021

View full text Add to dashboard Cite

Automated sewer defects detection has become an important trend for better management and maintenance of urban sewer systems. Deep learning technology has developed rapidly and offers an innovative solution for automated detection in engineering applications. However, insufficient data and unbalanced samples have proposed a big challenge to deep learning model training. This study adopts the state-of-the-art Style-based Generative Adversarial Networks (StyleGANs) model and compares the performances of its two variants in producing high-quality synthetic sewer defects images. Seven wellknown CNN models are further fine-tuned and trained using the synthetic images for automated sewer defects detection to examine the effects of StyleGANs on augmenting the detection performance. Results show that both StyleGANs are efficient in producing high-quality images with various styles and high-level details for multiple types of sewer defects. Specifically, the StyleGAN2-Adaptive Discriminator Augmentation (StyleGAN2-ADA) with the aid of Freeze Discriminator (Freeze-D) yields the best model performance. Among the adopted CNN classifiers, Inception_v3 achieves the highest detection accuracy. The mean detection accuracy is 94% (with a specific accuracy of 99.7%, 97%, 95.3% and 84% for tree root, residential wall, disjoint and obstacle, respectively) and confirms the reliability of the StyleGANs' performance. The study shows that StyleGANs provide a promising method to alleviate the limited and uneven dataset problem and can improve the deep learning model performance.

show abstract

Validating Seed Data Samples for Synthetic Identities – Methodology and Uniqueness Metrics

Cited by 26 publications

References 25 publications

On the use of automatically generated synthetic image datasets for benchmarking face recognition

On the use of automatically generated synthetic image datasets for benchmarking face recognition

Towards End-to-End Neural Face Authentication in the Wild -- Quantifying and Compensating for Directional Lighting Effects

Automated Sewer Defects Detection Using Style-Based Generative Adversarial Networks and Fine-Tuned Well-Known CNN Classifier

Contact Info

Product

Resources

About