Language models explain word reading times better than empirical predictability

Mj, Hofmann; Remus, Steffen; Biemann, Chris; Radach, Ralph

doi:10.31234/osf.io/u43p7

Cited by 3 publications

(4 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regarding (2), GPT-2 surprisal numerically outperforms cloze surprisal in all comparisons, significantly so for first pass and go-past durations (Figure 4). This outcome suggests that transformer language models are now on average at (or beyond) parity with cloze norms as estimators of human language processing difficulty (see also Hofmann et al, 2021;Michaelov, Coulson, and Bergen, 2022).…”

Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning

confidence: 96%

“…Indeed, the use of statistical rather than cloze predictability estimates has been cited as a criticism of prior work on the functional form of word predictability effects (Brothers and Kuperberg, 2021). However, some have argued that the cloze task may measure different cognitive processes than those that underlie real-time language comprehension (Smith and Levy, 2011;Staub et al, 2015), and there is currently debate as to whether cloze estimates underperform (Frisson, Rayner, and Pickering, 2005;Smith and Levy, 2011;Lopukhina, Lopukhin, and Laurinavichyute, 2021) or outperform (Hofmann et al, 2021;Michaelov, Coulson, and Bergen, 2022) statistical language models as estimators of human processing difficulty.…”

Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning

confidence: 99%

See 1 more Smart Citation

Large-Scale Evidence for Logarithmic Effects of Word Predictability on Reading Time

Shain¹,

Meister²,

Pimentel³

et al. 2022

Preprint

View full text Add to dashboard Cite

Words which are less predictable in context are harder to process, but theories of language processing diverge in how they explain this fact. Representational theories emphasize the demands of assembling sentence interpretations in memory; these theories predict a linear effect of predictability on processing cost. Inferential theories emphasize the demands of updating a probability distribution over possible sentence interpretations; these theories predict either a logarithmic or a superlogarithmic effect of predictability on processing cost, depending on whether they posit pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here we revisit this question at scale: we analyze six reading datasets, estimate next-word probabilities with diverse language models, and predict reading times with advanced nonlinear regression methods. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.

show abstract

Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning

confidence: 96%

Section: Do Results Change Under Cloze Estimates Of Word Predictability?mentioning

confidence: 99%

Large-Scale Evidence for Logarithmic Effects of Word Predictability on Reading Time

Shain¹,

Meister²,

Pimentel³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…As unrealistic this example may appear, imagine we had reliable data about the reading materials of a person for the last 10 years (newspapers, websites, books, etc. ), and then, we could construct reader-specific DSMs and predict individual reading behavior with remarkable accuracy (Hofmann et al, 2020 )—but also, of course—make sophisticated guesses about this person's opinions, preferences, etc., in other words things that big internet companies already use to further their business.…”

Section: Introductionmentioning

confidence: 99%

Computational Models of Readers' Apperceptive Mass

Jacobs

Kinder

2022

Front. Artif. Intell.

View full text Add to dashboard Cite

Recent progress in machine-learning-based distributed semantic models (DSMs) offers new ways to simulate the apperceptive mass (AM; Kintsch, 1980) of reader groups or individual readers and to predict their performance in reading-related tasks. The AM integrates the mental lexicon with world knowledge, as for example, acquired via reading books. Following pioneering work by Denhière and Lemaire (2004), here, we computed DSMs based on a representative corpus of German children and youth literature (Jacobs et al., 2020) as null models of the part of the AM that represents distributional semantic input, for readers of different reading ages (grades 1–2, 3–4, and 5–6). After a series of DSM quality tests, we evaluated the performance of these models quantitatively in various tasks to simulate the different reader groups' hypothetical semantic and syntactic skills. In a final study, we compared the models' performance with that of human adult and children readers in two rating tasks. Overall, the results show that with increasing reading age performance in practically all tasks becomes better. The approach taken in these studies reveals the limits of DSMs for simulating human AM and their potential for applications in scientific studies of literature, research in education, or developmental science.

show abstract

“…One of the strongest paradigm in computational semantics research, on the other hand, has been focusing on the representation of words as distributional vectors, and on the assessment of their semantic similarity on the basis of the similarity of the linguistic patterns of co-occurrence, extracted from large scale textual corpora (Turney and Pantel, 2010;Lenci, 2018). Given the success of Vector Space Models (henceforth VSMs) such as Word2Vec and GloVe (Pennington et al, 2014), researchers in cognitive science successfully tested them on a variety of psycholinguistic tasks, including the prediction of word associates (Mandera et al, 2017;Nematzadeh et al, 2017), the modeling of human-elicited cloze completion of sentences (Hofmann et al, 2017) and of association ratings (Hofmann et al, 2018). Interestingly, VSMs that are trained directly on word associations have been shown to outperform those trained on textual corpora in predicting human similarity and relatedness judgements, suggesting that such associations are providing a more accurate reflection of the structure of the mental lexicon (De Deyne et al, 2016).…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 14thWorkshop on Building and Using Comparable Corpora (BUCC 2021)

2021

View full text Add to dashboard Cite

In free word association tasks, human subjects are presented with a stimulus word and are then asked to name the first word (the response word) that comes up to their mind. Those associations, presumably learned on the basis of conceptual contiguity or similarity, have attracted for a long time the attention of researchers in linguistics and cognitive psychology, since they are considered as clues about the internal organization of the lexical knowledge in the semantic memory. Word associations data have also been used to assess the performance of Vector Space Models for English, but evaluations for other languages have been relatively rare so far. In this paper, we introduce word associations datasets for Italian, Spanish and Mandarin Chinese by extracting data from the Small World of Words project, and we propose two different tasks inspired by the previous literature. We tested both monolingual and crosslingual word embeddings on the new datasets, showing that they perform similarly in the evaluation tasks.

show abstract

Language models explain word reading times better than empirical predictability

Cited by 3 publications

References 80 publications

Large-Scale Evidence for Logarithmic Effects of Word Predictability on Reading Time

Large-Scale Evidence for Logarithmic Effects of Word Predictability on Reading Time

Computational Models of Readers' Apperceptive Mass

Proceedings of the 14thWorkshop on Building and Using Comparable Corpora (BUCC 2021)

Contact Info

Product

Resources

About