“…Our approach extracts word vectors for input to a Gaussian mixture model from the Transformer language model RoBERTa 1 (Liu et al, 2019). While RoBERTa has not been used as extensively in psycholinguistics as models such as GPT-2 (Radford et al, 2019), it has nevertheless proven useful for modeling a wide variety of phenomena, ranging from reading times (Oh, 2021), to neural responses (Mikolov, Coulson, & Bergen, 2022), to linguistic judgments (Mosbach et al, 2020;Lau et al, 2020). We focus on RoBERTa partly because it "encodes" language and can represent how a word fits with the preceding and following context surrounding a particular token.…”