Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Sommerauer, Pia

doi:10.18653/v1/2020.acl-srw.18

Cited by 4 publications

(4 citation statements)

References 36 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have designed an annotation task to analyze how different aspects of word meaning are represented in distributional representations (Sommerauer et al, 2019). In this paper, we investigate how we can measure the quality of the annotations and capture valid disagreement, which is crucial information for the diagnostic experiments we want to conduct (Sommerauer, 2020). The task is similar to that of Herbelot and Vecchi (2016), but uses basic yes-no questions so that it is suitable for crowd-annotations.…”

Section: Related Workmentioning

confidence: 99%

Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement

Sommerauer¹,

Fokkens²,

Vossen³

2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Semantic annotation tasks contain ambiguity and vagueness and require varying degrees of world knowledge. Disagreement is an important indication of these phenomena. Most traditional evaluation methods, however, critically hinge upon the notion of inter-annotator agreement. While alternative frameworks have been proposed, they do not move beyond agreement as the most important indicator of quality. Critically, evaluations usually do not distinguish between instances in which agreement is expected and instances in which disagreement is not only valid but desired because it captures the linguistic and cognitive phenomena in the data. We attempt to overcome these limitations using the example of a dataset that provides semantic representations for diagnostic experiments on language models. Ambiguity, vagueness, and difficulty are not only highly relevant for this use-case, but also play an important role in other types of semantic annotation tasks. We establish an additional, agreement-independent quality metric based on answer-coherence and evaluate it in comparison to existing metrics. We compare against a gold standard and evaluate on expected disagreement. Despite generally low agreement, annotations follow expected behavior and have high accuracy when selected based on coherence. We show that combining different quality metrics enables a more comprehensive evaluation than relying exclusively on agreement.

show abstract

Section: Related Workmentioning

confidence: 99%

Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement

Sommerauer¹,

Fokkens²,

Vossen³

2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this paper, we build upon the NLP literature on disagreement -or bias -in annotation (e.g., see Q. Shen and Rose 2021;Geva, Goldberg, and Berant 2019;Sommerauer 2020;Plank, Hovy, and Søgaard 2014) and so-called perspectivism (Cabitza, Campagner, and Basile 2023;Havens et al 2022) -i.e. the adoption of methods that integrate the opinions and perspectives of the human subjects involved in the knowledge representation step of the machine learning processes (Cabitza, Campagner, and Basile 2023).…”

Section: Introductionmentioning

confidence: 99%

Whose Truth is it Anyway? An Experiment on Annotation Bias in Times of Factual Opinion Polarization

Velden¹,

Reuver

Fokkens³

et al. 2023

Preprint

View full text Add to dashboard Cite

Information shapes citizens' political decision-making. This process is amply studied by social scientists, who have human annotation as a crucial instrument in their toolkit. To address concerns regarding the validity and reliability of annotation tasks, we establish strict standards. Due to the democratization of data and the advances in NLP more data can be analyzed or classified, making these standards are more important than ever: An algorithm trained on biased data will reproduce and often exacerbate bias. Our tools to create valid and reliable annotation data currently do not allow for dealing with different perspectives of annotators. In two pre-registered experiments in the United States and the Netherlands, we show that personal characteristics of annotators, like political ideology or knowledge, interfere with annotators' judgement of political stances. Our results show that to improve annotated data for automated text analyses, and for stance detection models in particular, we need to critically evaluate how we create our gold standards.

show abstract

“…Such features correspond to stereotypic tacit assumptions (Prince, 1978); common-sense knowledge we have about the real world. There is some evidence that language models implicitly encode such knowledge (Da and Kusai, 2019;Weir et al, 2020); however, coverage of different types of knowledge may be inconsistent, with evidence to suggest that these models fail to capture some types of semantic knowledge such as visual perceptual information (Sommerauer and Fokkens, 2018;Sommerauer, 2020), as well as questions about the completeness of such empirical studies (Fagarasan et al, 2015;Bulat et al, 2016;Silberer, 2017;Derby et al, 2019). In general, there has been only limited work that attempts to investigate whether these neural language models activate lexico-semantic knowledge similarly to humans, further restricted by the fact that such knowledge probing is only performed on latent representations that have received the target concept, ignoring theories of language comprehension and acquisition that emphasise the importance of prediction (Graesser et al, 1994;Dell and Chang, 2014;Kuperberg and Jaeger, 2016).…”

Section: Introductionmentioning

confidence: 99%

Representation and Pre-Activation of Lexical-Semantic Knowledge in Neural Language Models

Derby¹,

Devereux²,

Miller³

2021

Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

View full text Add to dashboard Cite

Neural network language models have the ability to capture the contextualised meanings of words in a sentence by dynamically evolving a representation of the linguistic input in a manner evocative of human language comprehension. While researchers have been able to analyse whether key linguistic regularities are adequately characterised by these evolving representations, determining whether they activate lexico-semantic knowledge similarly to humans remains challenging. In this paper, we perform a systematic analysis of how closely the intermediate layers from LSTM and transformer language models correspond to human semantic knowledge. Furthermore, in order to make more meaningful comparisons with theories of human language comprehension in psycholinguistics, we focus on two key stages where the meaning of a particular target word may arise: immediately before the word's presentation to the model (comparable to forward inferencing), and immediately after the word token has been input into the network. Our results indicate that the transformer models are better at capturing semantic knowledge relating to lexical concepts, both during word prediction and when retention is required.

show abstract

Why is penguin more similar to polar bear than to sea gull? Analyzing conceptual knowledge in distributional models

Cited by 4 publications

References 36 publications

Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement

Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement

Whose Truth is it Anyway? An Experiment on Annotation Bias in Times of Factual Opinion Polarization

Representation and Pre-Activation of Lexical-Semantic Knowledge in Neural Language Models

Contact Info

Product

Resources

About