Latent Embeddings for Zero-Shot Classification

Xian, Yongqin; Akata, Zeynep; Sharma, Gaurav; Nguyen, Quynh Nhu; Hein, Matthias; Schiele, Bernt

doi:10.1109/cvpr.2016.15

Cited by 613 publications

(538 citation statements)

References 33 publications

Supporting

Mentioning

534

Contrasting

Order By: Relevance

“…However, these methods involve the data of unseen classes to learn the model, which to some extent breaches the strict ZSL settings. Recent work [4], [33] combines the embedding-inferring procedure into a unified framework and empirically demonstrates better performance. The closest related work is [34], which takes one-step further to synthesise classifiers for unseen classes.…”

Section: Related Workmentioning

confidence: 99%

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Long

Liu

Shen

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Abstract-Sufficient training examples are the fundamental requirement for most of the learning tasks. However, collecting welllabelled training examples is costly. Inspired by Zero-shot Learning (ZSL) that can make use of visual attributes or natural language semantics as an intermediate level clue to associate low-level features with high-level classes, in a novel extension of this idea, we aim to synthesise training data for novel classes using only semantic attributes. Despite the simplicity of this idea, there are several challenges. Firstly, how to prevent the synthesised data from over-fitting to training classes? Secondly, how to guarantee the synthesised data is discriminative for ZSL tasks? Thirdly, we observe that only a few dimensions of the learnt features gain high variances whereas most of the remaining dimensions are not informative. Thus, the question is how to make the concentrated information diffuse to most of the dimensions of synthesised data. To address the above issues, we propose a novel embedding algorithm named Unseen Visual Data Synthesis (UVDS) that projects semantic features to the high-dimensional visual feature space. Two main techniques are introduced in our proposed algorithm. (1) We introduce a latent embedding space which aims to reconcile the structural difference between the visual and semantic spaces, meanwhile preserve the local structure. (2) We propose a novel Diffusion Regularisation (DR) that explicitly forces the variances to diffuse over most dimensions of the synthesised data. By an orthogonal rotation (more precisely, an orthogonal transformation), DR can remove the redundant correlated attributes and further alleviate the over-fitting problem. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data for zero-shot learning. Extensive experimental results suggest that our proposed approach significantly outperforms the state-of-the-art methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Long

Liu

Shen

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…These results are given for the data sets CUB, SUN, AWA1 and AWA2. We compare our approach with 12 leading GZSL methods, which are divided into three groups: semantic (SJE [24], ALE [25], LATEM [26], ES-ZSL [27], SYNC [12], DEVISE [2]), latent space learning (SAE [15], f-CLSWGAN [11], cycle-WGAN [3] and CADA-VAE [4]) and domain classification (CMT [6] and DAZSL [5]). The semantic group contains methods that only use the seen class visual and semantic samples to learn a transformation function from the visual to the semantic space, and classification is based on nearest neighbour classification in that semantic space.…”

Section: 4resultsmentioning

confidence: 99%

Generalised Zero-Shot Learning with Domain Classification in a Joint Semantic and Visual Space

Felix

Harwood

Sasdelli

et al. 2019

2019 Digital Image Computing: Techniques and Applications (DICTA)

View full text Add to dashboard Cite

Generalised zero-shot learning (GZSL) is a classification problem where the learning stage relies on a set of seen visual classes and the inference stage aims to identify both the seen visual classes and a new set of unseen visual classes. Critically, both the learning and inference stages can leverage a semantic representation that is available for the seen and unseen classes. Most state-of-the-art GZSL approaches rely on a mapping between latent visual and semantic spaces without considering if a particular sample belongs to the set of seen or unseen classes. In this paper, we propose a novel GZSL method that learns a joint latent representation that combines both visual and semantic information. This mitigates the need for learning a mapping between the two spaces. Our method also introduces a domain classification that estimates whether a sample belongs to a seen or an unseen class. Our classifier then combines a class discriminator with this domain classifier with the goal of reducing the natural bias that GZSL approaches have toward the seen classes. Experiments show that our method achieves state-of-the-art results in terms of harmonic mean, the area under the seen and unseen curve and unseen classification accuracy on public GZSL benchmark data sets. Our code will be available upon acceptance of this paper.

show abstract

“…In addition, non-linear compatibility mapping models have also been proposed. The LATEM [6] proposes piecewise compatibility modal learning which learns nonlinear compatibility function and the CMT [15] trains a neural network with two hidden layers to learn a nonlinear mapping from image feature space to word2vec space. The DEM [5] argues that the image feature space is more discriminative than semantic space, thus it proposes an end-to-end deep embedding model which maps from semantic space into the image feature space.…”

Section: Linear and Nonlinear Embedding Modelsmentioning

confidence: 99%

Discriminative Embedding Autoencoder With a Regressor Feedback for Zero-Shot Learning

Shi

Wei

2020

IEEE Access

View full text Add to dashboard Cite

Zero-shot learning (ZSL) aims to recognize the novel object categories using the semantic representation of categories, and the key idea is to explore the knowledge of how the novel class is semantically related to the familiar classes. Some typical models are to learn the proper embedding between the image feature space and the semantic space, whilst it is important to learn discriminative features and comprise the coarse-to-fine image feature and semantic information. In this paper, we propose a discriminative embedding autoencoder with a regressor feedback model for ZSL. The encoder learns a mapping from the image feature space to the discriminative embedding space, which regulates both interclass and intra-class distances between the learned features by a margin, making the learned features be discriminative for object recognition. The regressor feedback learns to map the reconstructed samples back to the the discriminative embedding and the semantic embedding, assisting the decoder to improve the quality of the samples and provide a generalization to the unseen classes. The proposed model is validated extensively on four benchmark datasets: SUN, CUB, AWA1, AWA2, the experiment results show that our proposed model outperforms the stateof-the-art models, and especially in the generalized zero-shot learning (GZSL), significant improvements are achieved.

show abstract

Latent Embeddings for Zero-Shot Classification

Cited by 613 publications

References 33 publications

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Generalised Zero-Shot Learning with Domain Classification in a Joint Semantic and Visual Space

Discriminative Embedding Autoencoder With a Regressor Feedback for Zero-Shot Learning

Contact Info

Product

Resources

About