Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions

Ba, Jimmy; Swersky, Kevin; Fidler, Sanja; Salakhutdinov, Ruslan

doi:10.1109/iccv.2015.483

Cited by 324 publications

(309 citation statements)

References 22 publications

Supporting

Mentioning

307

Contrasting

Unclassified

Order By: Relevance

“…ZSL requires by definition additional information (e.g., semantic description of unseen classes) to enable their recognition. A considerable progress has been made in studying attribute representation [27,28,2,15,61,59,29,3,43,1]. Attributes are a collection of semantic characteristics that are filled to uniquely describe unseen classes.…”

Section: Related Workmentioning

confidence: 99%

“…Attributes are a collection of semantic characteristics that are filled to uniquely describe unseen classes. Another ZSL trend is to use online textual descriptions [11,12,39,41,29]. Textual descriptions can be easily extracted from online sources like Wikipedia with a minimal overhead, avoiding the need to define hundreds of attributes and filling them for each class/image.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Creativity Inspired Zero-Shot Learning

Elhoseiny

Elfeki

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zeroshot learning, we model the visual learning process of unseen categories with an inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. Empirically, we show consistent improvement over the state of the art of several percents on the largest available benchmarks on the challenging task or generalized ZSL from a noisy text that we focus on, using the CUB and NABirds datasets. We also show the advantage of our approach on Attribute-based ZSL on three additional datasets (AwA2, aPY, and SUN). Code is available at

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Creativity Inspired Zero-Shot Learning

Elhoseiny

Elfeki

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…Since textual sources are relatively easy to obtain, [14], [20] propose to estimate the semantic relatedness of the novel classes from the text. [13], [36], [36] learn pseudo-concepts to associate novel classes using Wikipedia articles. Recently, lexical hierarchies in the ontology engineering are also exploited to find the relationships between classes [37], [38], [39].…”

Section: Related Workmentioning

confidence: 99%

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Long

Liu

Shen

et al. 2018

IEEE Trans. Pattern Anal. Mach. Intell.

100

View full text Add to dashboard Cite

Abstract-Sufficient training examples are the fundamental requirement for most of the learning tasks. However, collecting welllabelled training examples is costly. Inspired by Zero-shot Learning (ZSL) that can make use of visual attributes or natural language semantics as an intermediate level clue to associate low-level features with high-level classes, in a novel extension of this idea, we aim to synthesise training data for novel classes using only semantic attributes. Despite the simplicity of this idea, there are several challenges. Firstly, how to prevent the synthesised data from over-fitting to training classes? Secondly, how to guarantee the synthesised data is discriminative for ZSL tasks? Thirdly, we observe that only a few dimensions of the learnt features gain high variances whereas most of the remaining dimensions are not informative. Thus, the question is how to make the concentrated information diffuse to most of the dimensions of synthesised data. To address the above issues, we propose a novel embedding algorithm named Unseen Visual Data Synthesis (UVDS) that projects semantic features to the high-dimensional visual feature space. Two main techniques are introduced in our proposed algorithm. (1) We introduce a latent embedding space which aims to reconcile the structural difference between the visual and semantic spaces, meanwhile preserve the local structure. (2) We propose a novel Diffusion Regularisation (DR) that explicitly forces the variances to diffuse over most dimensions of the synthesised data. By an orthogonal rotation (more precisely, an orthogonal transformation), DR can remove the redundant correlated attributes and further alleviate the over-fitting problem. On four benchmark datasets, we demonstrate the benefit of using synthesised unseen data for zero-shot learning. Extensive experimental results suggest that our proposed approach significantly outperforms the state-of-the-art methods.

show abstract

“…Since the seen objects and unseen ones are only connected in the semantic space and the unseen objects need to be recognized by the visual features, zero-shot learning methods generally learn a visual-semantic embedding with the seen samples. At the zero-shot classification stage, unseen samples are projected into the semantic space and labeled by semantic attributes [5,15,16,29]. Instead of learning a visual-semantic embedding, some previous works also propose to learn a semantic-visual mapping so that the unseen samples can be represented by the seen ones [12,30].…”

Section: Zero-shot Learningmentioning

confidence: 99%

Leveraging the Invariant Side of Generative Zero-Shot Learning

Jing

Lü

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

300

207

View full text Add to dashboard Cite

Conventional zero-shot learning (ZSL) methods generally learn an embedding, e.g., visual-semantic mapping, to handle the unseen visual samples via an indirect manner. In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. Specifically, we train a conditional Wasserstein GANs in which the generator synthesizes fake unseen features from noises and the discriminator distinguishes the fake from real via a minimax game. Considering that one semantic description can correspond to various synthesized visual samples, and the semantic description, figuratively, is the soul of the generated features, we introduce soul samples as the invariant side of generative zero-shot learning in this paper. A soul sample is the meta-representation of one class. It visualizes the most semantically-meaningful aspects of each sample in the same category. We regularize that each generated sample (the varying side of generative ZSL) should be close to at least one soul sample (the invariant side) which has the same class label with it. At the zero-shot recognition stage, we propose to use two classifiers, which are deployed in a cascade way, to achieve a coarse-to-fine result. Experiments on five popular benchmarks verify that our proposed approach can outperform state-of-the-art methods with significant improvements 1 .

show abstract

Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions

Cited by 324 publications

References 22 publications

Creativity Inspired Zero-Shot Learning

Creativity Inspired Zero-Shot Learning

Zero-Shot Learning Using Synthesised Unseen Visual Data with Diffusion Regularisation

Leveraging the Invariant Side of Generative Zero-Shot Learning

Contact Info

Product

Resources

About