Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.67
|View full text |Cite
|
Sign up to set email alerts
|

A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English

Abstract: Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a natura… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 24 publications
0
0
0
Order By: Relevance
“…Our approach extracts word vectors for input to a Gaussian mixture model from the Transformer language model RoBERTa 1 (Liu et al, 2019). While RoBERTa has not been used as extensively in psycholinguistics as models such as GPT-2 (Radford et al, 2019), it has nevertheless proven useful for modeling a wide variety of phenomena, ranging from reading times (Oh, 2021), to neural responses (Mikolov, Coulson, & Bergen, 2022), to linguistic judgments (Mosbach et al, 2020;Lau et al, 2020). We focus on RoBERTa partly because it "encodes" language and can represent how a word fits with the preceding and following context surrounding a particular token.…”
Section: Representing Semantic Clusters Of Cloze Responsesmentioning
confidence: 99%
“…Our approach extracts word vectors for input to a Gaussian mixture model from the Transformer language model RoBERTa 1 (Liu et al, 2019). While RoBERTa has not been used as extensively in psycholinguistics as models such as GPT-2 (Radford et al, 2019), it has nevertheless proven useful for modeling a wide variety of phenomena, ranging from reading times (Oh, 2021), to neural responses (Mikolov, Coulson, & Bergen, 2022), to linguistic judgments (Mosbach et al, 2020;Lau et al, 2020). We focus on RoBERTa partly because it "encodes" language and can represent how a word fits with the preceding and following context surrounding a particular token.…”
Section: Representing Semantic Clusters Of Cloze Responsesmentioning
confidence: 99%