How Relevant Are Selectional Preferences for Transformer-based Language Models?

Metheniti, Eleni; Cruys, Tim Van de; Hathout, Nabil

doi:10.18653/v1/2020.coling-main.109

Cited by 11 publications

(9 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To verify and extend our findings, future work should test LLMs’ knowledge of selectional restrictions on features other than animacy, such as the physical constraints that a predicate places on its patients (Wang, Durrett, & Erk, 2018), evaluate their performance on impossible events that do not violate selectional restrictions per se (e.g., She gave birth to her mother, The man was killed twice , or After 10 coin tosses, she got 12 heads . ), and conduct more targeted tests of agent‐verb and patient‐verb plausibility (Metheniti et al., 2020).…”

Section: Discussionmentioning

confidence: 99%

“…Matsuki et al., 2011). Computational evidence suggests that BERT models are able to generalize their knowledge of selectional restrictions in novel word‐learning paradigms (Thrush et al., 2020) and can partially rely on the semantics of the head predicate to predict upcoming event participants (Metheniti, Van de Cruys, & Hathout, 2020). The asymmetry in performance on possible/impossible versus likely/unlikely events was independent from the specifics of LLM architecture and training and was additionally present, in an even more marked way, in our baseline models.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Kauf,

Ivanova,

Rambelli

et al. 2023

Cognitive Science

View full text Add to dashboard Cite

Word co‐occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs’ semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pretrained LLMs (from 2018's BERT to 2023's MPT) assign a higher likelihood to plausible descriptions of agent−patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n = 1215), we found that pretrained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign a higher likelihood to possible versus impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely versus unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow‐up analyses, we show that (i) LLM scores are driven by both plausibility and surface‐level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Kauf,

Ivanova,

Rambelli

et al. 2023

Cognitive Science

View full text Add to dashboard Cite

show abstract

“…Past work has probed token embeddings for knowledge of argument structure (Kann et al, 2019;Pavlick, 2022;Sasano and Korhonen, 2020;Tenney et al, 2019a,b;Zhu and de Melo, 2020). Other work has focused on neural networks' ability to predict the likelihood of a verb or noun in forms of an argument structure alternation (Chowdhury and Zamparelli, 2019;Hawkins et al, 2020b;Loáiciga et al, 2021;Metheniti et al, 2020;Petty et al, 2022;Yoshida and Oseki, 2022), and whether LLMs distinguish plausible from implausible argument-role mappings in role-reversal sentences (Ettinger, 2020). Though revealing of Type 0 knowledge, that work does not address whether LLMs can apply such knowledge productively, which is what drives our study.…”

Section: Related Workmentioning

confidence: 99%

How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure

Wilson,

Petty,

Frank

2023

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Language models are typically evaluated on their success at predicting the distribution of specific words in specific contexts. Yet linguistic knowledge also encodes relationships between contexts, allowing inferences between word distributions. We investigate the degree to which pre-trained transformer-based large language models (LLMs) represent such relationships, focusing on the domain of argument structure. We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts that were seen during pre-training (e.g., the active object and passive subject of the verb spray), succeeding by making use of the semantically organized structure of the embedding space for word embeddings. However, LLMs fail at generalizations between related contexts that have not been observed during pre-training, but which instantiate more abstract, but well-attested structural generalizations (e.g., between the active object and passive subject of an arbitrary verb). Instead, in this case, LLMs show a bias to generalize based on linear order. This finding points to a limitation with current models and points to a reason for which their training is data-intensive.1

show abstract

“…It includes experiments concerning their correlation with human judgment in terms of semantic similarity or their ability to classify word pairs according to different types of relations (Chersoni et al, 2016;Xiang et al, 2020). Other studies also considered contextual embeddings from a semantic viewpoint but in more specific contexts: the impact of their training objectives (Mickus et al, 2020), their level of contextualization (Ethayarajh, 2019), their possible biases (Bommasani et al, 2020), their ability to represent word senses (Coenen et al, 2019), to build representations for rare words (Schick and Schütze, 2020), to account for selectional preferences (Metheniti et al, 2020) or to interpret logical metonymy (Rambelli et al, 2020). Finally, (Chronis and Erk, 2020) is linked to the representation of word senses through the notion of prototype but mainly applies it for characterizing semantic similarity versus semantic relatedness and abstractness versus concreteness.…”

Section: Semantic Study Of Contextual Word Embeddingsmentioning

confidence: 99%

Using Distributional Principles for the Semantic Study of Contextual Language Models

Ferret¹

2021

Preprint

View full text Add to dashboard Cite

Many studies were recently done for investigating the properties of contextual language models but surprisingly, only a few of them consider the properties of these models in terms of semantic similarity. In this article, we first focus on these properties for English by exploiting the distributional principle of substitution as a probing mechanism in the controlled context of SemCor and WordNet paradigmatic relations. Then, we propose to adapt the same method to a more open setting for characterizing the differences between static and contextual language models.

show abstract

How Relevant Are Selectional Preferences for Transformer-based Language Models?

Cited by 11 publications

References 32 publications

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

Event Knowledge in Large Language Models: The Gap Between the Impossible and the Unlikely

How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure

Using Distributional Principles for the Semantic Study of Contextual Language Models

Contact Info

Product

Resources

About