Surprisal Estimators for Human Reading Times Need Character Models

Oh, Byung-Doh; Clark, Christian; Schuler, William

doi:10.18653/v1/2021.acl-long.290

Cited by 15 publications

(16 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because they may be seen as directly optimizing the distribution over next-word predictions, they provide strong tests of hypothesis that human language processing rests on a fundamentally similar principle. Third, recent work has shown that even larger transformer models trained on even larger corpora-models that show excellent next-word prediction performance-nevertheless exhibit a worse fit to human reading times than less capable models such as the GPT-2 model we tested (Oh and Schuler, 2023;Shain et al, 2022), reversing an earlier trend observed with weaker models (Wilcox et al, 2020;Goodkind and Bicknell, 2018). This suggests that further improving the underlying language model's next-word-prediction accuracy is unlikely to improve its surprisal-based estimates of our effects of interest.…”

Section: Implications For Theories Of Sentence Processingmentioning

confidence: 87%

Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark

Huang¹,

Arehalli²,

Kugemoto³

et al. 2023

Preprint

View full text Add to dashboard Cite

Prediction has been proposed as an overarching principle that explains human information processing in language and beyond. To what degree can processing difficulty in syntactically complex sentences - one of the major concerns of psycholinguistics - be explained by predictability, as estimated using computational language models? A precise, quantitative test of this question requires a much larger scale data collection effort than has been done in the past. We present the Syntactic Ambiguity Processing Benchmark, a dataset of self-paced reading times from 2000 participants, who read a diverse set of complex English sentences. This dataset makes it possible to measure processing difficulty associated with individual syntactic constructions, and even individual sentences, precisely enough to rigorously test the predictions of computational models of language comprehension. We find that the predictions of language models with two different architectures sharply diverge from the reading time data, dramatically underpredicting processing difficulty, failing to predict relative difficulty among different syntactic ambiguous constructions, and only partially explaining item-wise variability. These findings suggest that prediction is most likely insufficient on its own to explain human syntactic processing.

show abstract

Section: Implications For Theories Of Sentence Processingmentioning

confidence: 87%

Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark

Huang¹,

Arehalli²,

Kugemoto³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Section: Left Corner Predictorsmentioning

confidence: 99%

“…Full description of left corner parsing models of sentence comprehension is beyond the scope of this presentation (see e.g., Oh et al, 2021;Rasmussen & Schuler, 2018), which is restricted to the minimum details needed to define the predictors covered here. At a high level, phrasal structure derives from a sequence of lexical match (±L) and grammatical match (±G) decisions made at each word (see Oh et al, 2021 for relations to equivalent terms in the prior parsing literature). In terms of memory structures, the lexical decision depends on whether a new element (representing the current word and its hypothesized part of speech) matches current expectations about the upcoming syntactic category; if so, it is composed with the derivation at the front of the memory store (+L), and if not, it is pushed to the store as a new derivation fragment (-L).…”

Section: Left Corner Predictorsmentioning

confidence: 99%

Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

et al. 2022

View full text Add to dashboard Cite

A standard view of human language processing is that comprehenders build richly structured mental representations of natural language utterances, word by word, using computationally costly memory operations supported by domain-general working memory resources. However, three core claims of this view have been questioned, with some prior work arguing that (1) rich word-by-word structure building is not a core function of the language comprehension system, (2) apparent working memory costs are underlyingly driven by word predictability (surprisal), and/or (3) language comprehension relies primarily on domain-general rather than domain-specific working memory resources. In this work, we simultaneously evaluate all three of these claims using naturalistic comprehension in fMRI. In each participant, we functionally localize (a) a language-selective network and (b) a 'multiple-demand' network that supports working memory across domains, and we analyze the responses in these two networks of interest during naturalistic story listening with respect to a range of theory-driven predictors of working memory demand under rigorous surprisal controls. Results show robust surprisal-independent effects of word-by-word memory demand in the language network and no effect of working memory demand in the multiple demand network. Our findings thus support the view that language comprehension (1) entails word-by-word structure building using (2) computationally intensive memory operations that are not explained by surprisal. However, these results challenge (3) the domain-generality of the resources that support these operations, instead indicating that working memory operations for language comprehension are carried out by the same neural resources that store linguistic knowledge.

show abstract

“…This article is an extended presentation of Oh et al ( 2021 ), with additional algorithmic details of the left-corner parser and evaluations of structural parsers and neural LMs as surprisal estimators. These additional evaluations include a quantitative analysis of the effect of model capacity on predictive power for neural LMs, as well as a replication of the main experiments using a different regression method that is sensitive to temporal diffusion.…”

Section: Introductionmentioning

confidence: 99%

Comparison of Structural Parsers and Neural Language Models as Surprisal Estimators

Clark

Schuler

2022

Front. Artif. Intell.

Self Cite

View full text Add to dashboard Cite

Expectation-based theories of sentence processing posit that processing difficulty is determined by predictability in context. While predictability quantified via surprisal has gained empirical support, this representation-agnostic measure leaves open the question of how to best approximate the human comprehender's latent probability model. This article first describes an incremental left-corner parser that incorporates information about common linguistic abstractions such as syntactic categories, predicate-argument structure, and morphological rules as a computational-level model of sentence processing. The article then evaluates a variety of structural parsers and deep neural language models as cognitive models of sentence processing by comparing the predictive power of their surprisal estimates on self-paced reading, eye-tracking, and fMRI data collected during real-time language processing. The results show that surprisal estimates from the proposed left-corner processing model deliver comparable and often superior fits to self-paced reading and eye-tracking data when compared to those from neural language models trained on much more data. This may suggest that the strong linguistic generalizations made by the proposed processing model may help predict humanlike processing costs that manifest in latency-based measures, even when the amount of training data is limited. Additionally, experiments using Transformer-based language models sharing the same primary architecture and training data show a surprising negative correlation between parameter count and fit to self-paced reading and eye-tracking data. These findings suggest that large-scale neural language models are making weaker generalizations based on patterns of lexical items rather than stronger, more humanlike generalizations based on linguistic structure.

show abstract

Surprisal Estimators for Human Reading Times Need Character Models

Cited by 15 publications

References 29 publications

Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark

Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark

Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

Comparison of Structural Parsers and Neural Language Models as Surprisal Estimators

Contact Info

Product

Resources

About