Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop 2020
DOI: 10.18653/v1/2020.acl-srw.18
|View full text |Cite
|
Sign up to set email alerts
|

Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Abstract: What do powerful models of word meaning created from distributional data (e.g. Word2vec (Mikolov et al., 2013) BERT (Devlin et al., 2019) and ELMO (Peters et al., 2018)) represent? What causes words to be similar in the semantic space? What type of information is lacking? This thesis proposal presents a framework for investigating the information encoded in distributional semantic models. Several analysis methods have been suggested, but they have been shown to be limited and are not well understood. This app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 36 publications
(41 reference statements)
0
3
0
Order By: Relevance
“…We have designed an annotation task to analyze how different aspects of word meaning are represented in distributional representations (Sommerauer et al, 2019). In this paper, we investigate how we can measure the quality of the annotations and capture valid disagreement, which is crucial information for the diagnostic experiments we want to conduct (Sommerauer, 2020). The task is similar to that of Herbelot and Vecchi (2016), but uses basic yes-no questions so that it is suitable for crowd-annotations.…”
Section: Related Workmentioning
confidence: 99%
“…We have designed an annotation task to analyze how different aspects of word meaning are represented in distributional representations (Sommerauer et al, 2019). In this paper, we investigate how we can measure the quality of the annotations and capture valid disagreement, which is crucial information for the diagnostic experiments we want to conduct (Sommerauer, 2020). The task is similar to that of Herbelot and Vecchi (2016), but uses basic yes-no questions so that it is suitable for crowd-annotations.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we build upon the NLP literature on disagreement -or bias -in annotation (e.g., see Q. Shen and Rose 2021;Geva, Goldberg, and Berant 2019;Sommerauer 2020;Plank, Hovy, and Søgaard 2014) and so-called perspectivism (Cabitza, Campagner, and Basile 2023;Havens et al 2022) -i.e. the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of the machine learning processes (Cabitza, Campagner, and Basile 2023).…”
Section: Introductionmentioning
confidence: 99%
“…Such features correspond to stereotypic tacit assumptions (Prince, 1978); common-sense knowledge we have about the real world. There is some evidence that language models implicitly encode such knowledge (Da and Kusai, 2019;Weir et al, 2020); however, coverage of different types of knowledge may be inconsistent, with evidence to suggest that these models fail to capture some types of semantic knowledge such as visual perceptual information (Sommerauer and Fokkens, 2018;Sommerauer, 2020), as well as questions about the completeness of such empirical studies (Fagarasan et al, 2015;Bulat et al, 2016;Silberer, 2017;Derby et al, 2019). In general, there has been only limited work that attempts to investigate whether these neural language models activate lexico-semantic knowledge similarly to humans, further restricted by the fact that such knowledge probing is only performed on latent representations that have received the target concept, ignoring theories of language comprehension and acquisition that emphasise the importance of prediction (Graesser et al, 1994;Dell and Chang, 2014;Kuperberg and Jaeger, 2016).…”
Section: Introductionmentioning
confidence: 99%