While word predictability from sentence context is typically investigated by cloze completion probabilities (CCP), it can be more deeply understood by relying on language models (LMs), allowing to define the three key components of memory: Memory starts with experience as implemented by a text corpus, here defined by Wikipedia capturing general knowledge and (movie) subtitles approximating social interactions. LMs then consolidate a long-term memory structure from experience, as addressed by n-gram, topics and recurrent neural network (RNN) models. Retrieval was investigated by predicting fixation durations from an English and a German reading sample. Item-level regressions showed greater correlations of LMs with single-fixation duration (SFD), gaze duration (GD) and total viewing time (TVT) than CCP. When predicting each fixation case separately using generalized additive models, three LMs together always performed better than CCP. When testing single LMs against the typically-sized English CCP sample (N = 30), LMs usually performed better than CCP (8 vs. 3). The larger German CCP sample (N = 272), however, often performed better than single LMs (4 vs. 2). Subtitles-trained n-gram probabilities of present (and last) words allowed for reliable predictions of all fixation durations. Wikipedia-trained topic probabilities of the last and present word allow for reliable predictions of late GD and TVT effects. The present word predictions of RNNs were less sensitive to training-corpus choice and are recommendable if a single LM is used. Moreover, its reliable next word probability effects make it most suitable to address parafoveal preview and top-down predictions.
The present study uses a computational approach to examine the role of semantic constraints in normal reading. This methodology avoids confounds inherent in conventional measures of predictability, allowing for theoretically deeper accounts of semantic processing. We start from a definition of associations between words based on the significant log likelihood that two words co-occur frequently together in the sentences of a large text corpus. Direct associations between stimulus words were controlled, and semantic feature overlap between prime and target words was manipulated by their common associates. The stimuli consisted of sentences of the form pronoun, verb, article, adjective and noun, followed by a series of closed class words, e. g. "She rides the grey elephant on one of her many exploratory voyages". The results showed that verbnoun overlap reduces single and first fixation durations of the target noun and adjective-noun overlap reduces go-past durations. A dynamic spreading of activation account suggests that associates of the prime words take some time to become activated: The verb can act on the target noun's early eye-movement measures presented three words later, while the adjective is presented immediately prior to the target, which induces sentence re-examination after a difficult adjectivenoun semantic integration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.