Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics 2021
DOI: 10.18653/v1/2021.starsem-1.1
|View full text |Cite
|
Sign up to set email alerts
|

Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge

Abstract: Prior research has explored the ability of computational models to predict a word semantic fit with a given predicate. While much work has been devoted to modeling the typicality relation between verbs and arguments in isolation, in this paper we take a broader perspective by assessing whether and to what extent computational approaches have access to the information about the typicality of entire events and situations described in language (Generalized Event Knowledge).Given the recent success of Transformers… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 30 publications
1
5
0
Order By: Relevance
“…It is a list of seven structures -cleft, left dislocated, right dislocated, presentative "ci", inverted subject, pseudo-clefts, hanging topic -with a majority of Cleft sentences and Left dislocated sentences. As said above, similar results are obtained by the experiment presented in the paper by Pedinotti et al [11] where in Section IV they test the ability of Transformers -they use RoBERTa -on a small dataset with surface syntactic structures different from the recurrent word order. They modify the sentences to produce cleft and interrogative versions of the same sentences.…”
Section: The Dataset and The State-of-the-artsupporting
confidence: 77%
See 1 more Smart Citation
“…It is a list of seven structures -cleft, left dislocated, right dislocated, presentative "ci", inverted subject, pseudo-clefts, hanging topic -with a majority of Cleft sentences and Left dislocated sentences. As said above, similar results are obtained by the experiment presented in the paper by Pedinotti et al [11] where in Section IV they test the ability of Transformers -they use RoBERTa -on a small dataset with surface syntactic structures different from the recurrent word order. They modify the sentences to produce cleft and interrogative versions of the same sentences.…”
Section: The Dataset and The State-of-the-artsupporting
confidence: 77%
“…A partly similar approach has been attempted by Pedinotti et al [11], in a paper where they explore the ability of Transformer Models to predict transitive verb complements in typical predicate-argument contexts. Their results show clearly the inability to predict low frequency near synonyms, thus confirming the sensitivity of BERT-like models to frequency values.…”
Section: Word Predictability In Cognitive and Psycholinguistic Researchmentioning
confidence: 99%
“…Does this representation perpetuate the negative stereotype that men are bad at cooking? To investigate this, we should dive deeper into the semantic plausibility learned in language models (Porada et al, 2021;Pedinotti et al, 2021). Unless the focus is on the domain of natural science, there is less agreement on what would lean in spreading desirable and undesirable content, and the borderline can change across time and place.…”
Section: Content Validation For Fair Representationmentioning
confidence: 99%
“…On the one hand, even non‐fine‐tuned LLMs perform well on multiple tasks designed to probe world knowledge, such as the Winograd Schema Challenge (WSC; Levesque, Davis, & Morgenstern, 2012), the Story Cloze Test (SWAG; Zellers et al., 2018), and the Choice of Plausible Alternatives Test (COPA; Roemmele, Bejan, & Gordon, 2011), so much so that some authors have proposed and evaluated their use as off‐the‐shelf knowledge base models (Kassner, Dufter, & Schütze, 2021; Petroni et al., 2019; Roberts et al., 2020; Tamborrino, Pellicanò, Pannier, Voitot, & Naudin, 2020). On the other hand, studies using more fine‐grained tests have shown that world knowledge in contemporary LLMs is often brittle and depends strongly on the specific way the problem is stated (Elazar et al., 2021a; 2021b; Ettinger, 2020; Kassner & Schütze, 2020; McCoy, Pavlick, & Linzen, 2019; Niven & Kao, 2019; Pedinotti et al., 2021; Ravichander, Hovy, Suleman, Trischler, & Cheung, 2020; Ribeiro, Wu, Guestrin, & Singh, 2020). For example, some authors have noted that, when low‐level co‐occurrence statistics are properly controlled for, LLMs that were considered to have high accuracy on world knowledge tasks start to perform randomly (Elazar, Zhang, Goldberg, & Roth, 2021b; Sakaguchi, Bras, Bhagavatula, & Choi, 2021), highlighting the potential discrepancy between the word‐in‐context prediction objective (which benefits from tracking surface‐level statistics) and world knowledge acquisition (which should be invariant to surface‐level statistics).…”
Section: Introductionmentioning
confidence: 99%
“…To assess the plausibility of an arbitrary event, a successful model of GEK must, therefore, acquire robust, generalizable representations of a vast number of actions and their associated restrictions on event participants. Many traditional and current distributional models have been argued to lack the representations of these building blocks for more complex semantic structures (Lenci, 2023; Lenci & Sahlgren, 2023; Pedinotti et al., 2021; Zhu, Li, & De Melo, 2018). The acquisition of GEK is complicated even more because the frequency with which events are reported in the pragmatically influenced texts available in the world is not a robust indicator of the frequency with which they occur in the real world (Gordon & Van Durme, 2013; see also Section 4.3).…”
Section: Introductionmentioning
confidence: 99%