Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.290
|View full text |Cite
|
Sign up to set email alerts
|

Surprisal Estimators for Human Reading Times Need Character Models

Abstract: While the use of character models has been popular in NLP applications, it has not been explored much in the context of psycholinguistic modeling. This paper presents a character model that can be applied to a structural parser-based processing model to calculate word generation probabilities. Experimental results show that surprisal estimates from a structural processing model using this character model deliver substantially better fits to self-paced reading, eye-tracking, and fMRI data than those from large-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 29 publications
1
10
0
Order By: Relevance
“…Because they may be seen as directly optimizing the distribution over next-word predictions, they provide strong tests of hypothesis that human language processing rests on a fundamentally similar principle. Third, recent work has shown that even larger transformer models trained on even larger corpora-models that show excellent next-word prediction performance-nevertheless exhibit a worse fit to human reading times than less capable models such as the GPT-2 model we tested (Oh and Schuler, 2023;Shain et al, 2022), reversing an earlier trend observed with weaker models (Wilcox et al, 2020;Goodkind and Bicknell, 2018). This suggests that further improving the underlying language model's next-word-prediction accuracy is unlikely to improve its surprisal-based estimates of our effects of interest.…”
Section: Implications For Theories Of Sentence Processingmentioning
confidence: 87%
“…Because they may be seen as directly optimizing the distribution over next-word predictions, they provide strong tests of hypothesis that human language processing rests on a fundamentally similar principle. Third, recent work has shown that even larger transformer models trained on even larger corpora-models that show excellent next-word prediction performance-nevertheless exhibit a worse fit to human reading times than less capable models such as the GPT-2 model we tested (Oh and Schuler, 2023;Shain et al, 2022), reversing an earlier trend observed with weaker models (Wilcox et al, 2020;Goodkind and Bicknell, 2018). This suggests that further improving the underlying language model's next-word-prediction accuracy is unlikely to improve its surprisal-based estimates of our effects of interest.…”
Section: Implications For Theories Of Sentence Processingmentioning
confidence: 87%
“…Full description of left corner parsing models of sentence comprehension is beyond the scope of this presentation (see e.g., Oh et al, 2021;Rasmussen & Schuler, 2018), which is restricted to the minimum details needed to define the predictors covered here. At a high level, phrasal structure derives from a sequence of lexical match (±L) and grammatical match (±G) decisions made at each word (see Oh et al, 2021 for relations to equivalent terms in the prior parsing literature).…”
Section: Left Corner Predictorsmentioning
confidence: 99%
“…Full description of left corner parsing models of sentence comprehension is beyond the scope of this presentation (see e.g., Oh et al, 2021;Rasmussen & Schuler, 2018), which is restricted to the minimum details needed to define the predictors covered here. At a high level, phrasal structure derives from a sequence of lexical match (±L) and grammatical match (±G) decisions made at each word (see Oh et al, 2021 for relations to equivalent terms in the prior parsing literature). In terms of memory structures, the lexical decision depends on whether a new element (representing the current word and its hypothesized part of speech) matches current expectations about the upcoming syntactic category; if so, it is composed with the derivation at the front of the memory store (+L), and if not, it is pushed to the store as a new derivation fragment (-L).…”
Section: Left Corner Predictorsmentioning
confidence: 99%
“…This article is an extended presentation of Oh et al ( 2021 ), with additional algorithmic details of the left-corner parser and evaluations of structural parsers and neural LMs as surprisal estimators. These additional evaluations include a quantitative analysis of the effect of model capacity on predictive power for neural LMs, as well as a replication of the main experiments using a different regression method that is sensitive to temporal diffusion.…”
Section: Introductionmentioning
confidence: 99%