Distributional vectors encode referential attributes

Gupta, Abhijeet; Boleda, Gemma; Baroni, Marco; Padó, Sebastian

doi:10.18653/v1/d15-1002

Cited by 55 publications

(62 citation statements)

References 21 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early work in probing, (also known as diagnostic classification ,) extracted properties like parts-of-speech, gender, tense, and number from distributional word vector spaces like word2vec and GloVe (Mikolov et al, 2013;Pennington et al, 2014) using linear classifiers (Köhn, Part-of- 2015; Gupta et al, 2015). Soon after, the investigation of intermediate layers of deep models using linear probes was introduced independently by Ettinger et al (2016) and Shi et al (2016) in NLP and Alain and Bengio (2016) in computer vision.…”

Section: Related Workmentioning

confidence: 99%

Designing and Interpreting Probes with Control Tasks

Hewitt

Liang

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

359

436

View full text Add to dashboard Cite

Probes, supervised models trained to predict properties (like parts-of-speech) from representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But does this mean that the representations encode linguistic structure or just that the probe has learned the linguistic task? In this paper, we propose control tasks, which associate word types with random outputs, to complement linguistic tasks. By construction, these tasks can only be learned by the probe itself. So a good probe, (one that reflects the representation), should be selective, achieving high linguistic task accuracy and low control task accuracy. The selectivity of a probe puts linguistic task accuracy in context with the probe's capacity to memorize from word types. We construct control tasks for English part-of-speech tagging and dependency edge prediction, and show that popular probes on ELMo representations are not selective. We also find that dropout, commonly used to control probe complexity, is ineffective for improving selectivity of MLPs, but that other forms of regularization are effective. Finally, we find that while probes on the first layer of ELMo yield slightly better part-of-speech tagging accuracy than the second, probes on the second layer are substantially more selective, which raises the question of which layer better represents parts-of-speech.

show abstract

Section: Related Workmentioning

confidence: 99%

Designing and Interpreting Probes with Control Tasks

Hewitt

Liang

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

359

436

View full text Add to dashboard Cite

show abstract

“…There is some empirical evidence that distributional data can be used for inferring properties in Johns & Jones 2012, Fȃgȃrȃşan, Vecchi & Clark 2015, Gupta et al 2015, and Herbelot & Vecchi 2015. They test whether distributional vectors can be used to predict a word's properties (where, as above, I use the term "properties of a word" to mean properties that apply to all entities in the word's extension).…”

Section: :20mentioning

confidence: 99%

“…Finally, there is a need for more research on distributional models for property inference, to develop efficient models beyond the initial approaches proposed by Johns & Jones (2012), Fȃgȃrȃşan, Vecchi & Clark (2015), Herbelot & Vecchi (2015) and Gupta et al (2015) and to see what kinds of properties can be reliably learned and whether verb properties can be learned as well as noun properties.…”

Section: Katrin Erkmentioning

confidence: 99%

What do you know about an alligator when you know the company it keeps?

Erk

2016

S&P

View full text Add to dashboard Cite

Distributional models describe the meaning of a word in terms of its observed contexts. They have been very successful in computational linguistics. They have also been suggested as a model for how humans acquire (partial) knowledge about word meanings. But that raises the question of what, exactly, distributional models can learn, and the question of how distributional information would interact with everything else that an agent knows.For the first question, I build on recent work that indicates that distributional models can in fact distinguish to some extent between semantic relations, and argue that (the right kind of) distributional similarity indicates property overlap. For the second question, I suggest that if an agent does not know what an alligator is but knows that alligator is similar to crocodile, the agent can probabilistically infer properties of alligators from known properties of crocodiles. Distributional evidence is noisy and partial, so I adopt a probabilistic account of semantic knowledge that can learn from such data.

show abstract

“…But it implements the mapping as a systematic linear transformation. Our approach is similar to Gupta et al (2015), who predict numerical attributes for unseen concepts (countries and cities) from distributional vectors, getting comparably accurate estimates for features such as the GDP or CO 2 emissions of a country. We complement such research by providing a more formal interpretation of the mapping between language and world knowledge.…”

Section: Generalised Quantifiersmentioning

confidence: 99%

Building a shared world: mapping distributional to model-theoretic semantic spaces

Herbelot

Vecchi

2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In this paper, we introduce an approach to automatically map a standard distributional semantic space onto a set-theoretic model. We predict that there is a functional relationship between distributional information and vectorial concept representations in which dimensions are predicates and weights are generalised quantifiers. In order to test our prediction, we learn a model of such relationship over a publicly available dataset of feature norms annotated with natural language quantifiers. Our initial experimental results show that, at least for domain-specific data, we can indeed map between formalisms, and generate high-quality vector representations which encapsulate set overlap information. We further investigate the generation of natural language quantifiers from such vectors.

show abstract

Distributional vectors encode referential attributes

Cited by 55 publications

References 21 publications

Designing and Interpreting Probes with Control Tasks

Designing and Interpreting Probes with Control Tasks

What do you know about an alligator when you know the company it keeps?

Building a shared world: mapping distributional to model-theoretic semantic spaces

Contact Info

Product

Resources

About