2022
DOI: 10.1101/2022.09.22.509104
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A natural language fMRI dataset for voxelwise encoding models

Abstract: Speech comprehension is a complex process that draws on humans abilities to extract lexical information, parse syntax, and form semantic understanding. These sub-processes have traditionally been studied using separate neuroimaging experiments that attempt to isolate specific effects of interest. More recently it has become possible to study all stages of language comprehension in a single neuroimaging experiment using narrative natural language stimuli. The resulting data are richly varied at every level, ena… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 44 publications
0
5
0
Order By: Relevance
“…Moreover, unlike a non-human primate with chronically implanted electrodes, where it is possible to obtain neural responses to thousands of stimuli (over the course of days or weeks; e.g., 140 ), for humans, we are generally more limited in how much data is feasible to collect, both because of general constraints on participants' time and boredom/cognitive fatigue. One recent approach to combat the latter has been to turn to rich naturalistic stimuli, like stories, podcasts, or movies and to collect massive amounts of data (sometimes, many hours' worth) from a small number of individuals (e.g., 119,141,142 )-what is often referred to as the 'deep data' approach (e.g., [143][144][145][146][147][148] ). However, such stimuli do not sample the space of linguistic and/or semantic variation well, and consequently, do not allow for testing the model on stimuli that differ substantially from those used for training.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, unlike a non-human primate with chronically implanted electrodes, where it is possible to obtain neural responses to thousands of stimuli (over the course of days or weeks; e.g., 140 ), for humans, we are generally more limited in how much data is feasible to collect, both because of general constraints on participants' time and boredom/cognitive fatigue. One recent approach to combat the latter has been to turn to rich naturalistic stimuli, like stories, podcasts, or movies and to collect massive amounts of data (sometimes, many hours' worth) from a small number of individuals (e.g., 119,141,142 )-what is often referred to as the 'deep data' approach (e.g., [143][144][145][146][147][148] ). However, such stimuli do not sample the space of linguistic and/or semantic variation well, and consequently, do not allow for testing the model on stimuli that differ substantially from those used for training.…”
Section: Discussionmentioning
confidence: 99%
“…And second, language processing requires attentional engagement 141 , and such engagement is difficult to sustain for an extended period of time, especially if stimuli are repeated. One recent approach to combat fatigue/boredom has been to turn to rich naturalistic stimuli, like stories, podcasts, or movies and to collect massive amounts of data (sometimes, many hours' worth) from a small number of individuals (e.g., 118,142,143 )-what is often referred to as the 'deep data' approach (e.g., [144][145][146][147][148][149] ). However, such stimuli plausibly do not sample the space of linguistic and/or semantic variation well (see SI 10 for evidence), and consequently, do not allow for testing models on stimuli that differ substantially from those used during training.…”
Section: Discussionmentioning
confidence: 99%
“…Naturalistic stimulus data sets are easier to construct and often larger than controlled stimuli. For example, J. Chen et al (2017) publicly released a data set collected on a 50 min movie, Wehbe, Murphy, et al (2014) released data collected on an entire chapter from the Harry Potter books, comprising more than 5,000 words, and LeBel et al (2022) released data collected on over 5 hr of English podcasts per participant. These stimuli also provide a diverse test bed of linguistic phenomena—from a broad array of semantic concepts to rich temporal structure capturing discourse-level information.…”
Section: Experimental Designs In Language Neurosciencementioning
confidence: 99%
“…For this representation to be useful in an encoding analysis, it is important to sample a large set of different stimuli. To this avail, several efforts have been devoted towards the creation of large datasets to be used for testing the encoding of computational models in fMRI [Allen et al, 2022, LeBel et al, 2023]. Next, a link between brain activity (e.g.…”
Section: Introductionmentioning
confidence: 99%