Visually Grounded Meaning Representations

Silberer, Carina; Ferrari, Vittorio; Lapata, Mirella

doi:10.1109/tpami.2016.2635138

Cited by 42 publications

(44 citation statements)

References 65 publications

(122 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These weights ought to correlate with human knowledge better than the approximation given by linguistic patterns or by perceptual information alone, and it appears that they do (A. J. Anderson, Bruni, Bordignon, Poesio, & Baroni, 2013;Bruni, Boleda, Baroni, & Tran, 2012;Bruni, Tran, & Baroni, 2014;Frome et al, 2013;Silberer, Ferrari, & Lapata, 2016). This work, still in its early stages, is highly promising for providing further insights into what kinds of knowledge perception and language offer to the learner.…”

Section: What Kinds Of Semantic Knowledge Can We Learn From Language mentioning

confidence: 94%

From words-as-mappings to words-as-cues: the role of language in semantic knowledge

Lupyan

Lewis

2017

Language, Cognition and Neuroscience

View full text Add to dashboard Cite

Semantic knowledge (or semantic memory) is knowledge we have about the world. For example, we know that knives are typically sharp, made of metal, and that they are tools used for cutting. To what kinds of experiences do we owe such knowledge? Most work has stressed the role of direct sensory and motor experiences. Another kind of experience, considerably less well understood, is our experience with language. We review two ways of thinking about the relationship between language and semantic knowledge: (i) language as mapping onto independently-acquired concepts, and (ii) language as a set of cues to meaning. We highlight some problems with the words-as-mappings view, and argue in favor of the words-as-cues alternative. We then review some surprising ways that language impacts semantic knowledge, and discuss how distributional semantics models can help us better understand its role. We argue that language has an abstracting effect on knowledge, helping to go beyond concrete experiences which are more characteristic of perception and action. We conclude by describing several promising directions for future research. ROLE OF LANGUAGE IN SEMANTIC KNOWLEDGE 3

show abstract

Section: What Kinds Of Semantic Knowledge Can We Learn From Language mentioning

confidence: 94%

From words-as-mappings to words-as-cues: the role of language in semantic knowledge

Lupyan

Lewis

2017

Language, Cognition and Neuroscience

View full text Add to dashboard Cite

show abstract

“…Concreteness in datasets has been previously studied in either text-only cases (Turney et al, 2011;Hill et al, 2013) or by incorporating human judgments of perception into models (Silberer and Lapata, 2012;Hill and Korhonen, 2014a). Other work has quantified characteristics of concreteness in multimodal datasets (Young et al, 2014;Hill and Korhonen, 2014b;Kiela and Bottou, 2014;Jas and Parikh, 2015;Lazari-dou et al, 2015;Silberer et al, 2016;Lu et al, 2017;Bhaskar et al, 2017). Most related to our work is that of ; the authors use Google image search to collect 50 images each for a variety of words and compute the average cosine similarity between vector representations of returned images.…”

Section: Related Workmentioning

confidence: 99%

Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets

Hessel

Mimno

Lee

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

show abstract

“…In the vision community, image captioning has received much recent attention, where the goal is to produce a fluent and informative natural language description for a visual scene [27][28][29][30]. In natural language processing, images have also been used to capture aspects of meaning (semantics) of written language; see [31,32] for reviews. Other studies have considered multimodal modelling of sounds (not speech) with text and images [33][34][35], and phonemes with images [36].…”

Section: Related Workmentioning

confidence: 99%

Visually Grounded Learning of Keyword Prediction from Untranscribed Speech

Kamper

Settle²,

Shakhnarovich

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech. Concretely, we use an image-to-words multi-label visual classifier to tag images with soft textual labels, and then train a neural network to map from the speech to these soft targets. We show that the resulting speech system is able to predict which words occur in an utterance-acting as a spoken bag-of-words classifier-without seeing any parallel speech and text. We find that the model often confuses semantically related words, e.g. "man" and "person", making it even more effective as a semantic keyword spotter.

show abstract

Visually Grounded Meaning Representations

Cited by 42 publications

References 65 publications

From words-as-mappings to words-as-cues: the role of language in semantic knowledge

From words-as-mappings to words-as-cues: the role of language in semantic knowledge

Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets

Visually Grounded Learning of Keyword Prediction from Untranscribed Speech

Contact Info

Product

Resources

About