Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Chen, Long; Zhang, Hanwang; Xiao, Jun; Liu, Wei; Chang, Shih‐Fu

doi:10.1109/cvpr.2018.00115

Cited by 264 publications

(176 citation statements)

References 53 publications

Supporting

Mentioning

172

Contrasting

Order By: Relevance

“…We provide a general summary of the methods presented in [2], and encourage the reader to study that paper in order to obtain more details on previous works. The majority of the ZSL and GZSL methods tend to compensate the lack of visual representation of the unseen classes with the learning of a mapping between visual and semantic spaces [16], [17]. For instance, a fairly successful approach is based on a bi-linear compatibility function that associates visual representation and semantic features.…”

Section: Literature Reviewmentioning

confidence: 99%

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Felix

Kumar

Reid

et al. 2018

Lecture Notes in Computer Science

315

328

View full text Add to dashboard Cite

In generalized zero shot learning (GZSL), the set of classes are split into seen and unseen classes, where training relies on the semantic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes. Current methods address GZSL by learning a transformation from the visual to the semantic space, exploring the assumption that the distribution of classes in the semantic and visual spaces is relatively similar. Such methods tend to transform unseen testing visual representations into one of the seen classes' semantic features instead of the semantic features of the correct unseen class, resulting in low accuracy GZSL classification. Recently, generative adversarial networks (GAN) have been explored to synthesize visual representations of the unseen classes from their semantic features -the synthesized representations of the seen and unseen classes are then used to train the GZSL classifier. This approach has been shown to boost GZSL classification accuracy, but there is one important missing constraint: there is no guarantee that synthetic visual representations can generate back their semantic feature in a multi-modal cycle-consistent manner. This missing constraint can result in synthetic visual representations that do not represent well their semantic features, which means that the use of this constraint can improve GAN-based approaches. In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. Once our model is trained with this multi-modal cycle-consistent semantic compatibility, we can then synthesize more representative visual representations for the seen and, more importantly, for the unseen classes. Our proposed approach shows the best GZSL classification results in the field in several publicly available datasets.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Felix

Kumar

Reid

et al. 2018

Lecture Notes in Computer Science

315

328

View full text Add to dashboard Cite

show abstract

“…and compute M 0 and M c by equations (11) and (12). 6: Compute β by solving equation (16) and obtain f via the representer theorem in equation (8). 7: Update the so labels of D t :ˆ t = f (Z t ).…”

Section: Algorithm 1 Manifold Embedded Distribution Alignmentmentioning

confidence: 99%

“…e rapid growth of online media and content sharing applications has stimulated a great demand for automatic recognition and analysis for images and other multimedia data [8,20]. Unfortunately, it is o en expensive and time-consuming to acquire su cient labeled data to train machine learning models.…”

Section: Introductionmentioning

confidence: 99%

Visual Domain Adaptation with Manifold Embedded Distribution Alignment

Wang

Feng

Chen

et al. 2018

Proceedings of the 26th ACM International Conference on Multimedia

507

292

View full text Add to dashboard Cite

Visual domain adaptation aims to learn robust classi ers for the target domain by leveraging knowledge from a source domain. Existing methods either a empt to align the cross-domain distributions, or perform manifold subspace learning. However, there are two signi cant challenges: (1) degenerated feature transformation, which means that distribution alignment is o en performed in the original feature space, where feature distortions are hard to overcome. On the other hand, subspace learning is not su cient to reduce the distribution divergence. (2) unevaluated distribution alignment, which means that existing distribution alignment methods only align the marginal and conditional distributions with equal importance, while they fail to evaluate the di erent importance of these two distributions in real applications. In this paper, we propose a Manifold Embedded Distribution Alignment (MEDA) approach to address these challenges. MEDA learns a domain-invariant classi er in Grassmann manifold with structural risk minimization, while performing dynamic distribution alignment to quantitatively account for the relative importance of marginal and conditional distributions. To the best of our knowledge, MEDA is the rst a empt to perform dynamic distribution alignment for manifold domain adaptation. Extensive experiments demonstrate that MEDA shows signi cant improvements in classi cation accuracy compared to state-of-the-art traditional and deep methods. * e rst two authors contributed equally. † J. Wang and Y. Chen are also a liated with Beijing Key Lab. of Mobile Computing and Pervasive Devices. W. Feng is also with CAS Key Lab. of Network Data Science & Technology. J. Wang and W. Feng are also a liated with University of Chinese Academy of Sciences. ‡ P. Yu is also a liated

show abstract

“…According to [7], aPY [10] has a much smaller cosine similarity (0.58) between the attribute variances of the disjoint train and test images than the other datasets (0.98 for SUN, 0.95 for CUB, 0.74 for AwA2), which means it is harder to synthesize and classify images of unseen classes. Although previous methods have relatively low accuracy for unseen classes, our performance gain is even higher with such a difficult dataset.…”

Section: Resultsmentioning

confidence: 99%

Generative Dual Adversarial Network for Generalized Zero-Shot Learning

Huang

Wang²,

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

225

183

View full text Add to dashboard Cite

This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. Most previous models try to learn a fixed one-directional mapping between visual and semantic space, while some recently proposed generative methods try to generate image features for unseen classes so that the zero-shot learning problem becomes a traditional fully-supervised classification problem. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual → semantic mapping, semantic → visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embedding features as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-theart models in classifying images from unseen classes.

show abstract

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Cited by 264 publications

References 53 publications

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Multi-modal Cycle-Consistent Generalized Zero-Shot Learning

Visual Domain Adaptation with Manifold Embedded Distribution Alignment

Generative Dual Adversarial Network for Generalized Zero-Shot Learning

Contact Info

Product

Resources

About