2020
DOI: 10.31234/osf.io/u43p7
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Language models explain word reading times better than empirical predictability

Abstract: While word predictability from sentence context is typically investigated by cloze completion probabilities (CCP), it can be more deeply understood by relying on language models (LMs), allowing to define the three key components of memory: Memory starts with experience as implemented by a text corpus, here defined by Wikipedia capturing general knowledge and (movie) subtitles approximating social interactions. LMs then consolidate a long-term memory structure from experience, as addressed by n-gram, topics and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 80 publications
0
4
0
Order By: Relevance
“…Regarding (2), GPT-2 surprisal numerically outperforms cloze surprisal in all comparisons, significantly so for first pass and go-past durations (Figure 4). This outcome suggests that transformer language models are now on average at (or beyond) parity with cloze norms as estimators of human language processing difficulty (see also Hofmann et al, 2021;Michaelov, Coulson, and Bergen, 2022).…”
Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning
confidence: 96%
See 1 more Smart Citation
“…Regarding (2), GPT-2 surprisal numerically outperforms cloze surprisal in all comparisons, significantly so for first pass and go-past durations (Figure 4). This outcome suggests that transformer language models are now on average at (or beyond) parity with cloze norms as estimators of human language processing difficulty (see also Hofmann et al, 2021;Michaelov, Coulson, and Bergen, 2022).…”
Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning
confidence: 96%
“…Indeed, the use of statistical rather than cloze predictability estimates has been cited as a criticism of prior work on the functional form of word predictability effects (Brothers and Kuperberg, 2021). However, some have argued that the cloze task may measure different cognitive processes than those that underlie real-time language comprehension (Smith and Levy, 2011;Staub et al, 2015), and there is currently debate as to whether cloze estimates underperform (Frisson, Rayner, and Pickering, 2005;Smith and Levy, 2011;Lopukhina, Lopukhin, and Laurinavichyute, 2021) or outperform (Hofmann et al, 2021;Michaelov, Coulson, and Bergen, 2022) statistical language models as estimators of human processing difficulty.…”
Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning
confidence: 99%
“…As unrealistic this example may appear, imagine we had reliable data about the reading materials of a person for the last 10 years (newspapers, websites, books, etc. ), and then, we could construct reader-specific DSMs and predict individual reading behavior with remarkable accuracy (Hofmann et al, 2020 )—but also, of course—make sophisticated guesses about this person's opinions, preferences, etc., in other words things that big internet companies already use to further their business.…”
Section: Introductionmentioning
confidence: 99%
“…One of the strongest paradigm in computational semantics research, on the other hand, has been focusing on the representation of words as distributional vectors, and on the assessment of their semantic similarity on the basis of the similarity of the linguistic patterns of co-occurrence, extracted from large scale textual corpora (Turney and Pantel, 2010;Lenci, 2018). Given the success of Vector Space Models (henceforth VSMs) such as Word2Vec and GloVe (Pennington et al, 2014), researchers in cognitive science successfully tested them on a variety of psycholinguistic tasks, including the prediction of word associates (Mandera et al, 2017;Nematzadeh et al, 2017), the modeling of human-elicited cloze completion of sentences (Hofmann et al, 2017) and of association ratings (Hofmann et al, 2018). Interestingly, VSMs that are trained directly on word associations have been shown to outperform those trained on textual corpora in predicting human similarity and relatedness judgements, suggesting that such associations are providing a more accurate reflection of the structure of the mental lexicon (De Deyne et al, 2016).…”
Section: Introductionmentioning
confidence: 99%