Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark

Huang, Kuan-Jung; Arehalli, Suhas; Kugemoto, Mari; Muxica, Christian; Prasad, Grusha; Dillon, Brian; Linzen, Tal

doi:10.31234/osf.io/z38u6

Cited by 20 publications

(16 citation statements)

References 78 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Schijndel & Linzen [13] generated a linear regression model of reading times as a function of surprisal, and used it to predict garden-path effects. Similar to us, they observed that surprisal underestimates the garden-path effects, but different from us, they could not distinguish between the NP/S and NP/Z sentences [13,26,27]. Although statistical tests are not quoted in [13], our model clearly outperforms the results obtained from surprisal and our predictions for NP/Z sentences are significantly higher than for NP/S sentences, see table 9.…”

Section: (B) Comparison With Surprisalsupporting

confidence: 63%

“…Indeed, the reading times used for doing the linear regression were only averages of reading times across different sentences and different participants; and this caused discrepancies in our results. We plan to use more detailed datasets such as a recent one released in [27]…”

Section: Conclusion and Discussionmentioning

confidence: 99%

“…In [25], he used the probabilities obtained from a probabilistic context-free parser to predict the existence of a garden-path effect. In the following work, empirical correlations between surprisal and self-paced reading times of garden-path sentences were also studied [13,[26][27][28]. It was shown that, although the surprisal calculated using different language models is able to predict a garden-path effect, it consistently underestimates its magnitude.…”

Section: (B) Modelling Garden-path With Surprisalmentioning

confidence: 99%

See 2 more Smart Citations

Causality and signalling of garden-path sentences

Wang,

Sadrzadeh

2024

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

Sheaves are mathematical objects that describe the globally compatible data associated with open sets of a topological space. Original examples of sheaves were continuous functions; later they also became powerful tools in algebraic geometry, as well as logic and set theory. More recently, sheaves have been applied to the theory of contextuality in quantum mechanics. Whenever the local data are not necessarily compatible, sheaves are replaced by the simpler setting of presheaves. In previous work, we used presheaves to model lexically ambiguous phrases in natural language and identified the order of their disambiguation. In the work presented here, we model syntactic ambiguities and study a phenomenon in human parsing called garden-pathing. It has been shown that the information-theoretic quantity known as ‘surprisal’ correlates with human reading times in natural language but fails to do so in garden-path sentences. We compute the degree of signalling in our presheaves using probabilities from the large language model BERT and evaluate predictions on two psycholinguistic datasets. Our degree of signalling outperforms surprisal in two ways: (i) it distinguishes between hard and easy garden-path sentences (with a p -value < 10 − 5 ), whereas existing work could not, (ii) its garden-path effect is larger in one of the datasets (32 ms versus 8.75 ms per word), leading to better prediction accuracies. This article is part of the theme issue ‘Quantum contextuality, causality and freedom of choice’.

show abstract

Section: (B) Comparison With Surprisalsupporting

confidence: 63%

Section: Conclusion and Discussionmentioning

confidence: 99%

Section: (B) Modelling Garden-path With Surprisalmentioning

confidence: 99%

See 1 more Smart Citation

Causality and signalling of garden-path sentences

Wang,

Sadrzadeh

2024

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

show abstract

“…We do agree with Houghton et al that it can be useful to compare language in DNNs and humans to explore the capacities of DNNs that do not have any language-specific learning mechanism. But at present, not only do the learning objectives and learning constraints seem wildly different in the two systems, but also, the performance of fully trained models "sharply diverges" from humans in controlled experiments (Huang et al, 2023).…”

Section: R62 Marketing and (Mis)characterizing Research Findingsmentioning

confidence: 99%

Deep problems with neural network models of human vision

Bowers¹,

Dujmović²,

Montero³

et al. 2022

Behav Brain Sci

View full text Add to dashboard Cite

Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modelling approaches that focus on psychological data.

show abstract

“…In other words, although prior evidence favors an inferential (rather than a procedural preactivation-based) interpretation of predictability effects and thus implicates inference as a key “causal bottleneck” on processing demand, the present finding of surprisal-independent frequency effects could suggest limits on the scope of this bottleneck: frequency (and thus plausibly lexical retrieval) also plays a large and surprisal-independent role in determining how long participants spend reading words. Given the remarkable success of surprisal in accounting for a range of language processing phenomena across diverse experimental measures (Demberg & Keller, 2008 ; Frank & Bod, 2011 ; Frank et al, 2015 ; Heilbron et al, 2022 ; Hoover et al, 2023 ; Lopopolo et al, 2017 ; Roark et al, 2009 ; Shain et al, 2020 , in press ; Smith & Levy, 2013 ; van Schijndel & Schuler, 2015 ; Wilcox et al, 2020 ), discoveries highlighting the explanatory limits of surprisal offer opportunities for new insights into the mechanisms and representational format of incremental meaning construction during language comprehension (e.g., Huang et al, 2023 ; van Schijndel & Linzen, 2021 ).…”

Section: Discussionmentioning

confidence: 99%

Word Frequency and Predictability Dissociate in Naturalistic Reading

Shain

2024

Open Mind

View full text Add to dashboard Cite

Many studies of human language processing have shown that readers slow down at less frequent or less predictable words, but there is debate about whether frequency and predictability effects reflect separable cognitive phenomena: are cognitive operations that retrieve words from the mental lexicon based on sensory cues distinct from those that predict upcoming words based on context? Previous evidence for a frequency-predictability dissociation is mostly based on small samples (both for estimating predictability and frequency and for testing their effects on human behavior), artificial materials (e.g., isolated constructed sentences), and implausible modeling assumptions (discrete-time dynamics, linearity, additivity, constant variance, and invariance over time), which raises the question: do frequency and predictability dissociate in ordinary language comprehension, such as story reading? This study leverages recent progress in open data and computational modeling to address this question at scale. A large collection of naturalistic reading data (six datasets, >2.2 M datapoints) is analyzed using nonlinear continuous-time regression, and frequency and predictability are estimated using statistical language models trained on more data than is currently typical in psycholinguistics. Despite the use of naturalistic data, strong predictability estimates, and flexible regression models, results converge with earlier experimental studies in supporting dissociable and additive frequency and predictability effects.

show abstract

Surprisal does not explain syntactic disambiguation difficulty: evidence from a large-scale benchmark

Cited by 20 publications

References 78 publications

Causality and signalling of garden-path sentences

Causality and signalling of garden-path sentences

Deep problems with neural network models of human vision

Word Frequency and Predictability Dissociate in Naturalistic Reading

Contact Info

Product

Resources

About