2018
DOI: 10.1007/s11263-018-1140-0
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Understanding of Scenes Through the ADE20K Dataset

Abstract: Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we present a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. Totally there are 25k images of the complex everyday scenes cont… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
752
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 1,094 publications
(837 citation statements)
references
References 35 publications
3
752
0
1
Order By: Relevance
“…To accomplish this, we needed a large dataset of natural images in which all object occurrences were labeled. We took advantage of the recently created ADE20K database, which contains 22,210 annotated scenes in which every object has been manually labeled by an expert human annotator 25 . One approach for characterizing the co-occurrence statistics of this dataset would be to simply construct a matrix of co-occurrence frequencies for all pairwise comparisons of objects.…”
Section: Object Embeddingsmentioning
confidence: 99%
See 3 more Smart Citations
“…To accomplish this, we needed a large dataset of natural images in which all object occurrences were labeled. We took advantage of the recently created ADE20K database, which contains 22,210 annotated scenes in which every object has been manually labeled by an expert human annotator 25 . One approach for characterizing the co-occurrence statistics of this dataset would be to simply construct a matrix of co-occurrence frequencies for all pairwise comparisons of objects.…”
Section: Object Embeddingsmentioning
confidence: 99%
“…In the field of computational linguistics, there is a long history of modeling word co-occurrence data in language corpora with dense, lower dimensional representations 26 . This modeling framework, known as distributional semantics, has proved highly useful because 25 , which contains 22,210 images in which every pixel is associated with an object label provided by an expert human annotator. An adaptation of the word2vec machine-learning algorithm for distributional semantics-which we call object2vec-was applied to this corpus of image annotations to model the statistical regularities of object-label cooccurrence in a large sample of real-world scenes.…”
Section: Object Embeddingsmentioning
confidence: 99%
See 2 more Smart Citations
“…134 Therefore, in this analysis, we investigate the case if units corresponding to the free 135 space show a higher correlation with the behavior and brain RDMs than the readout 136 layer of the VGG scene-parse . The readout layer of the VGG scene-parse consists of 151 137 channels with 150 channel each containing an output corresponding to a particular class 138 in the ADE20k [26] dataset and 1 channel corresponding to the background. Therefore, 139 it is straightforward to separate specific category activation from the readout layer.…”
mentioning
confidence: 99%