MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

Guo, Yandong; Zhang, Lei; Hu, Yuxiao; He, Xiaodong; Gao, Jianfeng

doi:10.1007/978-3-319-46487-9_6

Cited by 1,398 publications

(1,161 citation statements)

References 17 publications

Supporting

Mentioning

1,154

Contrasting

Unclassified

Order By: Relevance

“…4. To increase the retrieval difficulty, random 355k distractor images are sampled from the MS-Celeb-1M Dataset (Guo et al, 2016), as before taking care to include only true distractor people. The sampled distractor sets are constructed such that the number of faces per set follows the same distribution as in the Celebrity Together dataset.…”

Section: Evaluating On the Celebrity Together Datasetmentioning

confidence: 99%

Compact Deep Aggregation for Set Retrieval

Zhong

Arandjelović

Zisserman

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem -that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities, all but one, etc.To this end, we make the following contributions: first, we propose a CNN architecture -SetNet -to achieve the objective: it learns face descriptors and their aggregation over a set to produce a compact fixed length descriptor designed for set retrieval, and the score of an image is a count of the number of identities that match the query; second, we show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that -far exceeding a number of baselines; third, we explore the speed vs. retrieval quality trade-off for set retrieval using this compact descriptor; and, finally, we collect and annotate a large dataset of images containing various number of celebrities, which we use for evaluation and is publicly released.

show abstract

Section: Evaluating On the Celebrity Together Datasetmentioning

confidence: 99%

Compact Deep Aggregation for Set Retrieval

Zhong

Arandjelović

Zisserman

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The Sub-MS-Celeb dataset is rebuilt from MS-Celeb [12] dataset contains 87139 face images from 2589 classes after removing the dirty face images and non-frontal face images, and it is chosen as the source domain. All images are aligned and cropped to 64*64 pixels according to five landmarks: two eyes, nose and mouth corners (see Fig.3).…”

Section: Datasets and Preprocessingmentioning

confidence: 99%

A Novel Kinship Verification Method Based on Deep Transfer Learning and Feature Nonlinear Mapping

Yang¹,

Wu²

2017

dtcse

View full text Add to dashboard Cite

There are some problems when the discriminative features are used in the traditional kinship verification methods, such as focusing on the local region information, containing a lot of noisy in non-face regions and redundant information in overlapping regions, manual parameters setting and high dimension. To solve the above problems, a novel kinship verification method based on deep transfer learning and feature nonlinear mapping is proposed in this paper. Firstly, a new deep learning model trained on the face recognition dataset is transferred to the kinship datasets to extract high-level feature. Secondly, siamese multi-layer perceptrons and triangular similarity metric learning are combined to reduce the dimensionality of feature vector by nonlinear mapping. Meanwhile it would guarantee a smaller distance between kin pairs while a larger distance between non-kin pairs. Lastly, the cosine similarity of feature vector pairs is computed, and traditional classifier, such as SVM, is used. Experiments on the TSKinFace, KinFace W-I and KinFace W-II datasets indicate the proposed method could achieve better performance than the traditional methods.

show abstract

“…The dataset is constructed by Microsoft and is available for noncommercial use. [9] further describes the process of assembling the images and the metric used for the choice of the 100K celebrity provided in the dataset. We used the whole dataset for the training of our neural network.…”

Section: ) Frgcmentioning

confidence: 99%

“…The training datasets that were used in this work are the FRGC dataset (because it is a relatively big dataset at the time that it was introduced), and the MS-celeb-1M [9] (because this is to our knowledge among the biggest publicly available datasets). More details about these databases are given below.…”

Section: Training Datasetsmentioning

confidence: 99%

“…CMU has already worked in this direction, but their published results of 92.92% are far from the 99.96% that Google got on LFW. We have chosen to exploit the publicly available MS-celeb-1M [9] dataset. We evaluate the performance of our newly trained system on the (LFW), as well as the MOBIO [10] dataset (a very challenging audio-visual dataset).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

State-of-the-art face recognition performance using publicly available software and datasets

Hmani¹,

Petrovska‐Delacrétaz²

2018

2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

View full text Add to dashboard Cite

Abstract-We are interested in the reproducibility of face recognition systems. By reproducibility we mean: is the scientific community, and are the researchers from different sides, capable of reproducing the last published results by a big company, that has at its disposal huge computational power and huge proprietary databases?With the constant advancements in GPU computation power and availability of open-source software, the reproducibility of published results should not be a problem. But, if architectures of the systems are private and databases are proprietary, the reproducibility of published results can not be easily attained. To tackle this problem, we focus on training and evaluation of face recognition systems on publicly available data and software. We are also interested in comparing the best Deep Neural Net (DNN) based results with a baseline "classical" system. This paper exploits the OpenFace open-source system to generate a deep convolutional neural network model using publicly available datasets. We study the impact of the size of the datasets, their quality and compare the performance to a classical face recognition approach. Our focus is to have a fully reproducible model. To this end, we used publicly available datasets (FRGC, MS-celeb-1M, MOBIO, LFW), as well publicly available software (OpenFace) to train our model in order to do face recognition. Our best trained model achieves 97.52% accuracy on the Labelled in the Wild dataset (LFW) dataset which is lower than Google's best reported results of 99.96% but slightly better than FaceBook's reported result of 97.35%. We also evaluated our best model on the challenging video dataset MOBIO and report competitive results with the best reported results on this database.

show abstract

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

Cited by 1,398 publications

References 17 publications

Compact Deep Aggregation for Set Retrieval

Compact Deep Aggregation for Set Retrieval

A Novel Kinship Verification Method Based on Deep Transfer Learning and Feature Nonlinear Mapping

State-of-the-art face recognition performance using publicly available software and datasets

Contact Info

Product

Resources

About