2015 IEEE International Conference on Computer Vision (ICCV) 2015
DOI: 10.1109/iccv.2015.483
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions

Abstract: One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images. Recent work has shown that learning from textual descriptions, such as Wikipedia articles, avoids the problem of having to explicitly define these attributes. We present a new model that can classify unseen categories from their textual description. Specifically, we use text features to predict the output weights of both the convolutional and the fully connected layers in a deep convoluti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
307
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 324 publications
(309 citation statements)
references
References 22 publications
1
307
0
1
Order By: Relevance
“…ZSL requires by definition additional information (e.g., semantic description of unseen classes) to enable their recognition. A considerable progress has been made in studying attribute representation [27,28,2,15,61,59,29,3,43,1]. Attributes are a collection of semantic characteristics that are filled to uniquely describe unseen classes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…ZSL requires by definition additional information (e.g., semantic description of unseen classes) to enable their recognition. A considerable progress has been made in studying attribute representation [27,28,2,15,61,59,29,3,43,1]. Attributes are a collection of semantic characteristics that are filled to uniquely describe unseen classes.…”
Section: Related Workmentioning
confidence: 99%
“…Attributes are a collection of semantic characteristics that are filled to uniquely describe unseen classes. Another ZSL trend is to use online textual descriptions [11,12,39,41,29]. Textual descriptions can be easily extracted from online sources like Wikipedia with a minimal overhead, avoiding the need to define hundreds of attributes and filling them for each class/image.…”
Section: Related Workmentioning
confidence: 99%
“…Since textual sources are relatively easy to obtain, [14], [20] propose to estimate the semantic relatedness of the novel classes from the text. [13], [36], [36] learn pseudo-concepts to associate novel classes using Wikipedia articles. Recently, lexical hierarchies in the ontology engineering are also exploited to find the relationships between classes [37], [38], [39].…”
Section: Related Workmentioning
confidence: 99%
“…Since the seen objects and unseen ones are only connected in the semantic space and the unseen objects need to be recognized by the visual features, zero-shot learning methods generally learn a visual-semantic embedding with the seen samples. At the zero-shot classification stage, unseen samples are projected into the semantic space and labeled by semantic attributes [5,15,16,29]. Instead of learning a visual-semantic embedding, some previous works also propose to learn a semantic-visual mapping so that the unseen samples can be represented by the seen ones [12,30].…”
Section: Zero-shot Learningmentioning
confidence: 99%