A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English

Mosbach, Marius; Degaetano-Ortlieb, Stefania; Krielke, Marie-Pauline; Abdullah, Badr M.; Klakow, Dietrich

doi:10.18653/v1/2020.coling-main.67

Cited by 1 publication

(1 citation statement)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our approach extracts word vectors for input to a Gaussian mixture model from the Transformer language model RoBERTa 1 (Liu et al, 2019). While RoBERTa has not been used as extensively in psycholinguistics as models such as GPT-2 (Radford et al, 2019), it has nevertheless proven useful for modeling a wide variety of phenomena, ranging from reading times (Oh, 2021), to neural responses (Mikolov, Coulson, & Bergen, 2022), to linguistic judgments (Mosbach et al, 2020;Lau et al, 2020). We focus on RoBERTa partly because it "encodes" language and can represent how a word fits with the preceding and following context surrounding a particular token.…”

Section: Representing Semantic Clusters Of Cloze Responsesmentioning

confidence: 99%

Uncovering patterns of semantic predictability in sentence processing

Jacobs,

Hubbard,

Federmeier

et al. 2024

Preprint

View full text Add to dashboard Cite

Psycholinguistic researchers have used the cloze task to measure the predictability of upcoming words, but have largely discarded the variability in the structure of responses people provide. This variability in the semantic structure of responses may be important for understanding selection during language production; however, it has proven difficult to model the semantic variability of participants' responses, and thus upcoming semantic uncertainty. Recent advances in large language models (LLMs) permit us to approximate the degree of semantic variability in cloze responses, but most methods are restricted to symbolic or hand-crafted meaning representations. We show in two studies that Bayesian Gaussian mixture models can cluster LLM representations of participants' responses and produce coherent, taxonomically similar clusters. We apply these clustering algorithms to response time data in a serial cloze task and show that the semantic structure of cloze responses influences how quickly people are able to provide a response. We show clear effects of semantic competition on naming speed. In addition to providing novel operationalizations of what semantic competition might look like in the cloze task, we explain how this clustering method is extensible to other datasets and applications of interest to researchers of semantic processing in psycholinguistics.

show abstract