Deep face recognition has achieved remarkable improvements due to the introduction of margin-based softmax loss, in which the prototype stored in the last linear layer represents the center of each class. In these methods, training samples are enforced to be close to positive prototypes and far apart from negative prototypes by a clear margin.
However, we argue that prototype learning only employs sample-to-prototype comparisons without considering sample-to-sample comparisons during training and the low loss value gives us an illusion of perfect feature embedding, impeding the further exploration of SGD. To this end, we propose Variational Prototype Learning (VPL), which represents every class as a distribution instead of a point in the latent space. By identifying the slow feature drift phenomenon, we directly inject memorized features into prototypes to approximate variational prototype sampling. The proposed VPL can simulate sample-to-sample comparisons within the classification framework, encour-aging the SGD solver to be more exploratory, while boosting performance. Moreover, VPL is conceptually simple, easy to implement, computationally efficient and memory saving. We present extensive experimental results on popular benchmarks, which demonstrate the superiority of the proposed VPL method over the state-of-the-art competitors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.