Within human sentence processing, it is known that there are large effects of a word's probability in context on how long it takes to read it. This relationship has been quantified using informationtheoretic surprisal, or the amount of new information conveyed by a word. Here, we compare surprisals derived from a collection of language models derived from n-grams, neural networks, and a combination of both. We show that the models' psychological predictive power improves as a tight linear function of language model linguistic quality. We also show that the size of the effect of surprisal is estimated consistently across all types of language models. These findings point toward surprising robustness of surprisal estimates and suggest that surprisal estimated by low-quality language models are not biased.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.