This is the third in a series of reports on ongoing face recognition vendor tests (FRVT) executed by the National Institute of Standards and Technology (NIST). The first two reports cover, respectively, the performance of one-to-one face recognition algorithms used for verification of asserted identities, and performance of one-to-many face recognition algorithms used for identification of individuals in photo data bases. This document extends those evaluations to document accuracy variations across demographic groups. MOTIVATION The recent expansion in the availability, capability, and use of face recognition has been accompanied by assertions that demographic dependencies could lead to accuracy variations and potential bias. A report from Georgetown University [14] work noted that prior studies [22], articulated sources of bias, described the potential impacts particularly in a policing context, and discussed policy and regulatory implications. Additionally, this work is motivated by studies of demographic effects in more recent face recognition [9, 16, 23] and gender estimation algorithms [5, 36]. AIMS AND SCOPE NIST has conducted tests to quantify demographic differences in contemporary face recognition algorithms. This report provides details about the recognition process, notes where demographic effects could occur, details specific performance metrics and analyses, gives empirical results, and recommends research into the mitigation of performance deficiencies. NIST intends this report to inform discussion and decisions about the accuracy, utility, and limitations of face recognition technologies. Its intended audience includes policy makers, face recognition algorithm developers, systems integrators, and managers of face recognition systems concerned with mitigation of risks implied by demographic differentials.
This is the second of a series of reports on the performance of face recognition algorithms on faces occluded by protective face masks [2] commonly worn to reduce inhalation and exhalation of viruses. Inspired by the COVID-19 pandemic response, this is a continuous study being run under the Ongoing Face Recognition Vendor Test (FRVT) executed by the National Institute of Standards and Technology (NIST). In our first report [8], we tested "pre-pandemic" algorithms that were already submitted to FRVT 1:1 prior to mid-March 2020. This report augments its predecessor with results for more recent algorithms provided to NIST after mid-March 2020. While we do not have information on whether or not a particular algorithm was designed with face coverings in mind, the results show evidence that a number of developers have adapted their algorithms to support face recognition on subjects potentially wearing face masks. The algorithms tested were one-to-one algorithms submitted to the FRVT 1:1 Verification track. Future editions of this document will also report accuracy of one-to-many algorithms. WHAT'S NEW This report includes Results from evaluating 65 face recognition algorithms provided to NIST since mid-March 2020 Assessment of when both the enrollment and verification images are masked (in addition to when only the verification image is masked) Results for red and white colored masks (in addition to light-blue and black) Cumulative results for 152 algorithms evaluated to date (submitted both prior to and after mid-March 2020) MOTIVATION Traditionally, face recognition systems (in cooperative settings) are presented with mostly nonoccluded faces, which include primary facial features such as the eyes, nose, and mouth. However, there are a number of circumstances in which faces are occluded by masks such as in pandemics, medical settings, excessive pollution, or laboratories. Inspired by the COVID-19 pandemic response, the widespread requirement that people wear protective face masks in public places has driven a need to understand how cooperative face recognition technology deals with occluded faces, often with just the periocular area and above visible. An increasing number of research publications have surfaced on the topic of face recognition on people wearing masks along with face-masked research datasets [10]. A number of commercial providers have announced the availability of face recognition algorithms capable of handling face masks, and this report documents performance results for 65 algorithms submitted to NIST after mid-March 2020. This report includes results for all algorithms evaluated to date. At the time of this writing, we are not aware of any large-scale, independent, and publicly reported evaluation on the effects of face mask occlusion on face recognition. WHAT WE DID The NIST Information Technology Laboratory (ITL) quantified the accuracy of face recognition algorithms on faces occluded by masks applied digitally to a large set of photos that has been used in an FRVT verification benchmark since 201...
at NIST for designing robust software infrastructure for image and template storage and parallel execution of algorithms across our computers. Thanks also to Brian Cochran at NIST for providing highly available computers and network-attached storage. DISCLAIMER Specific hardware and software products identified in this report were used in order to perform the evaluations described in this document. In no case does identification of any commercial product, trade name, or vendor, imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products and equipment identified are necessarily the best available for the purpose.
IntroductionFacial gender classification is an area studied in the Face Recognition Vendor Test (FRVT) Still Facial Images Track. While peripheral to automated face recognition, it has become a growing area of research, with potential use in various appli cations. The motivation for gender classification systems has grown in recent years, with rise of the digital age and the increase in human-computer interaction. Gender-based indexing of face images, gender-targeted surveillance (e.g., moni toring gender-restricted areas), gender-adaptive targeted marketing (e.g., displaying gender-specific advertisements from digital signage), and passive gender demographic data collection are potential applications of automated gender classifi cation.NIST performed a large scale empirical evaluation of facial gender classification algorithms, with participation from five commercial providers and one university, using large operational datasets comprised of facial images from visas and law enforcement mugshots, leveraging a combined corpus of close to 1 million images. NIST employed a lights-out, black-box testing methodology designed to model operational reality where software is shipped and used "as-is" without subse quent algorithmic training. Core gender classification accuracy was baselined over a large dataset composed of images collected under well-controlled pose, illumination, and facial expression conditions, then assessed demographically by gender, age group, and ethnicity. Analysis on commonly benchmarked "in the wild" (i.e., unconstrained) datasets was conducted and compared with those from the constrained dataset. The impact of number of image samples per subject was captured and assessments of classification performance on sketches and gender verification accuracy were documented. Key ResultsCore Accuracy and Speed: Gender classification accuracy depends strongly on the provider of the core technology. Broadly, there is a threefold difference between the most accurate and the least accurate algorithm in terms of gender classification error, which is the percentage of images classified incorrectly. The most accurate algorithm (E32D from NEC) can correctly classify the gender of a person over a constrained database of approximately 1 million images 96.5% of the time. All algorithms can perform gender classification on a single image in less than 0.25 seconds with one server-class processor. The most accurate algorithm, on average, performs classification in 0.04 seconds. Section 3.1. Impact of Demographic Data on Accuracy:For a dataset of 240 thousand visa images, it is empirically observed that, overall, gender classification is more accurate in males than females. All of the algorithms show a significant decrease in gender classification accuracy as age increases for adult females, with an empirically decreasing trend in accuracy seen in females past age 50. Gender classification is more accurate in adult males (ages 21-60) than young boys (ages 0-10) for all of the algorithms. For females, gender classification ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.