Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior temporal gyrus in the human brain (Hullett et al., 2016). In this study, we investigate what happens to these neural representations past the superior temporal gyrus and how they engage higher-level language processing areas such as inferior frontal gyrus. We used low-level sound features to model neural responses to speech outside of the primary auditory cortex. Two complementary imaging techniques were used with human participants (both males and females): electrocorticography (ECoG) and fMRI. Both imaging techniques showed tuning of the perisylvian cortex to low-level speech features. With ECoG, we found evidence of propagation of the temporal features of speech sounds along the ventral pathway of language processing in the brain toward inferior frontal gyrus. Increasingly coarse temporal features of speech spreading from posterior superior temporal cortex toward inferior frontal gyrus were associated with linguistic features such as voice onset time, duration of the formant transitions, and phoneme, syllable, and word boundaries. The present findings provide the groundwork for a comprehensive bottom-up account of speech comprehension in the human brain. We know that, during natural speech comprehension, a broad network of perisylvian cortical regions is involved in sound and language processing. Here, we investigated the tuning to low-level sound features within these regions using neural responses to a short feature film. We also looked at whether the tuning organization along these brain regions showed any parallel to the hierarchy of language structures in continuous speech. Our results show that low-level speech features propagate throughout the perisylvian cortex and potentially contribute to the emergence of "coarse" speech representations in inferior frontal gyrus typically associated with high-level language processing. These findings add to the previous work on auditory processing and underline a distinctive role of inferior frontal gyrus in natural speech comprehension.
Understanding how the human brain processes auditory input remains a challenge. Traditionally, a distinction between lower-and higher-level sound features is made, but their definition depends on a specific theoretical framework and might not match the neural representation of sound. Here, we postulate that constructing a data-driven neural model of auditory perception, with a minimum of theoretical assumptions about the relevant sound features, could provide an alternative approach and possibly a better match to the neural responses. We collected electrocorticography recordings from six patients who watched a long-duration feature film. The raw movie soundtrack was used to train an artificial neural network model to predict the associated neural responses. The model achieved high prediction accuracy and generalized well to a second dataset, where new participants watched a different film. The extracted bottom-up features captured acoustic properties that were specific to the type of sound and were associated with various response latency profiles and distinct cortical distributions. Specifically, several features encoded speech-related acoustic properties with some features exhibiting shorter latency profiles (associated with responses in posterior perisylvian cortex) and others exhibiting longer latency profiles (associated with responses in anterior perisylvian cortex). Our results support and extend the current view on speech perception by demonstrating the presence of temporal hierarchies in the perisylvian cortex and involvement of cortical sites outside of this region during audiovisual speech perception.
Various brain regions are implicated in speech processing, and the specific function of some of them is better understood than others. In particular, involvement of the dorsal precentral cortex (dPCC) in speech perception remains debated, and attribution of the function of this region is more or less restricted to motor processing. In this study, we investigated high-density intracranial responses to speech fragments of a feature film, aiming to determine whether dPCC is engaged in perception of continuous speech. Our findings show that dPCC exhibited preference to speech over other tested sounds. Moreover, the identified area was involved in tracking of speech auditory properties including speech spectral envelope, its rhythmic phrasal pattern and pitch contour. DPCC also showed the ability to filter out noise from the perceived speech. Comparing these results to data from motor experiments showed that the identified region had a distinct location in dPCC, anterior to the hand motor area and superior to the mouth articulator region. The present findings uncovered with high-density intracranial recordings help elucidate the functional specialization of PCC and demonstrate the unique role of its anterior dorsal region in continuous speech perception.
Research on how the human brain extracts meaning from sensory input relies in principle on methodological reductionism. In the present study, we adopt a more holistic approach by modeling the cortical responses to semantic information that was extracted from the visual stream of a feature film, employing artificial neural network models. Advances in both computer vision and natural language processing were utilized to extract the semantic representations from the film by combining perceptual and linguistic information. We tested whether these representations were useful in studying the human brain data. To this end, we collected electrocorticography responses to a short movie from 37 subjects and fitted their cortical patterns across multiple regions using the semantic components extracted from film frames. We found that individual semantic components reflected fundamental semantic distinctions in the visual input, such as presence or absence of people, human movement, landscape scenes, human faces, etc. Moreover, each semantic component mapped onto a distinct functional cortical network involving high-level cognitive regions in occipitotemporal, frontal and parietal cortices. The present work demonstrates the potential of the data-driven methods from information processing fields to explain patterns of cortical responses, and contributes to the overall discussion about the encoding of high-level perceptual information in the human brain. Semantic processing of audiovisual material is a topic of great interest in cognitive neuroscience. A lot of knowledge has been accrued with studies focusing on the visual object processing, object categorization and processing of various semantic attributes of the visually perceived objects 1-3. We have come to understand a lot about the ventral stream of visual processing in the brain 4,5 , the object categorization ability of inferior temporal cortex 6-8 as well as the specific roles of fusiform face area 9 , parahippocampal place area 10 , the motion processing temporal region 11 and other areas involved in high-level visual processing 4,12-14. Most studies address the topic with a reductionist approach, with carefully constructed tasks and stimuli to investigate a particular aspect of brain function 15. With new tools available to investigate high-dimensional phenomena, more holistic approaches become feasible. Complex brain mechanisms may be identified or characterized by mapping brain responses onto informational structures that are extracted from non-constrained sensory material. More concretely, one can utilize recent advances in computational modeling in different domains such as computer vision and natural language processing that are driven by extraction of high-level semantic relations directly from the unprocessed input, such as images 16,17 and texts 18-20. Among the most recent and most powerful such advances are deep artificial neural network models. Not only do these models achieve unprecedented performance on solving complex tasks (for example, visual object i...
Intracranial human recordings are a valuable and rare resource of information about the brain. Making such data publicly available not only helps tackle reproducibility issues in science, it helps make more use of these valuable data. This is especially true for data collected using naturalistic tasks. Here, we describe a dataset collected from a large group of human subjects while they watched a short audiovisual film. The dataset has several unique features. First, it includes a large amount of intracranial electroencephalography (iEEG) data (51 participants, age range of 5–55 years, who all performed the same task). Second, it includes functional magnetic resonance imaging (fMRI) recordings (30 participants, age range of 7–47) during the same task. Eighteen participants performed both iEEG and fMRI versions of the task, non-simultaneously. Third, the data were acquired using a rich audiovisual stimulus, for which we provide detailed speech and video annotations. This dataset can be used to study neural mechanisms of multimodal perception and language comprehension, and similarity of neural signals across brain recording modalities.
Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field. In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task. We show that 1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; 2) individual word decoding in reconstructed speech achieves 92-100\% accuracy (chance level is 8\%); 3) direct reconstruction from sensorimotor brain activity produces intelligible speech. These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
Intracranial human recordings are a valuable and rare resource that the whole neuroscience community can benefit from. Making such data available to the neuroscience community not only helps tackle the reproducibility issues in science, it also helps make more use of this valuable data. The latter is especially true for data collected using naturalistic tasks. Here, we describe a dataset collected from a large group of human subjects while they watched a short audiovisual film. The dataset is characterized by several unique features. First, it combines a large amount of intracranial data from 51 intracranial electroencephalography (iEEG) participants, who all did the same task. Second, the intracranial data are accompanied by fMRI recordings acquired for the same task in 30 functional magnetic resonance imaging (fMRI) participants. Third, the data were acquired using a rich audiovisual stimulus, for which we provide detailed speech and video annotations. This multimodal dataset can be used to address questions about neural mechanisms of multimodal perception and language comprehension as well as the nature of the neural signal acquired during the same task across brain recording modalities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.