Zero Shot Learning (ZSL), a type of structured multi-output learning, has attracted much attention due to its requirement of no training data for target classes. Conventional ZSL methods usually project visual features into semantic space and assign labels by finding their nearest prototypes. However, this type of Nearest Neighbor Search (NNS) based methods often suffer from great performance degradation because of the nonuniform variances between different categories. In this paper, we propose a probabilistic framework by taking covariance into account to deal with the problem mentioned above. In this framework, we define a new latent space, which has two characteristics. The first is the features in this space should gather within classes and scatter between classes, which is implemented by triplet learning, the second is the prototypes of unseen classes are synthesized with nonnegative coefficients which are generated by Nonnegative Matrix Factorization (NMF) of relations between the seen classes and unseen classes in attribute space. During training, the learned parameters are the projection model for triplet network and the nonnegative coefficients between unseen classes and seen classes. In the testing phase, visual features are projected into latent space and assigned with the labels that have the maximum probability among unseen classes for classic ZSL or within all classes for Generalized ZSL. Extensive experiments are conducted on four popular datasets, and the results show that the proposed method can outperform the state-of-the-art methods in most circumstances.
Zero Shot Learning (ZSL) has been attracting increasing attention due to its powerful ability of recognizing objects of unseen classes. As one type of ZSL methods, the low rank based strategy has achieved remarkable success. However, traditional low rank based methods are often based on the assumption that a variety of visual features from a same class can be projected to a single attribute by ignoring the background information and other noisy interference in visual features. This assumption is unreasonable and often leads to bad performance when there is big variance within a class. In this paper, a novel method called Prototype Relaxation with Robust Principal Component Analysis (RPCA) is proposed to relax this assumption by adding a sparse noise constraint. In addition, to avoid the confusion between similar classes, an orthogonal constraint is employed to disperse all the class prototypes, including both seen and unseen classes, in latent space. Furthermore, to alleviate the domain shift problem, vectors from latent space are exploited to reconstruct visual features and semantic attributes respectively. Besides, the hubness problem is also mitigated by applying the max probability model in all three spaces. Extensive experiments are conducted on four popular datasets and the results demonstrate the superiority of this method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.