2017
DOI: 10.1109/tpami.2016.2635138
|View full text |Cite
|
Sign up to set email alerts
|

Visually Grounded Meaning Representations

Abstract: Abstract-In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level representations from textual and visual input. The visual modality is encoded via vectors of attributes obtained automatically from images. We create a new large-scale taxonomy of 600 visual attributes representing more than 500 concepts and 700K images. We use this dataset to train attribute classifiers and integrate their … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
44
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 42 publications
(44 citation statements)
references
References 65 publications
(122 reference statements)
0
44
0
Order By: Relevance
“…These weights ought to correlate with human knowledge better than the approximation given by linguistic patterns or by perceptual information alone, and it appears that they do (A. J. Anderson, Bruni, Bordignon, Poesio, & Baroni, 2013;Bruni, Boleda, Baroni, & Tran, 2012;Bruni, Tran, & Baroni, 2014;Frome et al, 2013;Silberer, Ferrari, & Lapata, 2016). This work, still in its early stages, is highly promising for providing further insights into what kinds of knowledge perception and language offer to the learner.…”
Section: What Kinds Of Semantic Knowledge Can We Learn From Language mentioning
confidence: 94%
“…These weights ought to correlate with human knowledge better than the approximation given by linguistic patterns or by perceptual information alone, and it appears that they do (A. J. Anderson, Bruni, Bordignon, Poesio, & Baroni, 2013;Bruni, Boleda, Baroni, & Tran, 2012;Bruni, Tran, & Baroni, 2014;Frome et al, 2013;Silberer, Ferrari, & Lapata, 2016). This work, still in its early stages, is highly promising for providing further insights into what kinds of knowledge perception and language offer to the learner.…”
Section: What Kinds Of Semantic Knowledge Can We Learn From Language mentioning
confidence: 94%
“…Concreteness in datasets has been previously studied in either text-only cases (Turney et al, 2011;Hill et al, 2013) or by incorporating human judgments of perception into models (Silberer and Lapata, 2012;Hill and Korhonen, 2014a). Other work has quantified characteristics of concreteness in multimodal datasets (Young et al, 2014;Hill and Korhonen, 2014b;Kiela and Bottou, 2014;Jas and Parikh, 2015;Lazari-dou et al, 2015;Silberer et al, 2016;Lu et al, 2017;Bhaskar et al, 2017). Most related to our work is that of ; the authors use Google image search to collect 50 images each for a variety of words and compute the average cosine similarity between vector representations of returned images.…”
Section: Related Workmentioning
confidence: 99%
“…In the vision community, image captioning has received much recent attention, where the goal is to produce a fluent and informative natural language description for a visual scene [27][28][29][30]. In natural language processing, images have also been used to capture aspects of meaning (semantics) of written language; see [31,32] for reviews. Other studies have considered multimodal modelling of sounds (not speech) with text and images [33][34][35], and phonemes with images [36].…”
Section: Related Workmentioning
confidence: 99%