A growing body of evidence shows that ongoing oscillations in auditory cortex modulate their phase to match the rhythm of temporally regular acoustic stimuli, increasing sensitivity to relevant environmental cues and improving detection accuracy. In the current study, we test the hypothesis that nonsensory information provided by linguistic content enhances phase-locked responses to intelligible speech in the human brain. Sixteen adults listened to meaningful sentences while we recorded neural activity using magnetoencephalography. Stimuli were processed using a noise-vocoding technique to vary intelligibility while keeping the temporal acoustic envelope consistent. We show that the acoustic envelopes of sentences contain most power between 4 and 7 Hz and that it is in this frequency band that phase locking between neural activity and envelopes is strongest. Bilateral oscillatory neural activity phase-locked to unintelligible speech, but this cerebro-acoustic phase locking was enhanced when speech was intelligible. This enhanced phase locking was left lateralized and localized to left temporal cortex. Together, our results demonstrate that entrainment to connected speech does not only depend on acoustic characteristics, but is also affected by listeners’ ability to extract linguistic information. This suggests a biological framework for speech comprehension in which acoustic and linguistic cues reciprocally aid in stimulus prediction.
A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.
Everyday conversation frequently includes challenges to the clarity of the acoustic speech signal, including hearing impairment, background noise, and foreign accents. Although an obvious problem is the increased risk of making word identification errors, extracting meaning from a degraded acoustic signal is also cognitively demanding, which contributes to increased listening effort. The concepts of cognitive demand and listening effort are critical in understanding the challenges listeners face in comprehension, which are not fully predicted by audiometric measures. Here I review converging behavioral, pupillometric, and neuroimaging evidence that understanding acoustically degraded speech requires additional cognitive support, and that this cognitive load can interfere with other operations such as language processing and memory for what has been heard. Behaviorally, acoustic challenge is associated with increased errors in speech understanding, poorer performance on concurrent secondary tasks, more difficulty processing linguistically complex sentences, and reduced memory for verbal material. Measures of pupil dilation support the challenge associated with processing a degraded acoustic signal, indirectly reflecting an increase in neural activity. Finally, functional brain imaging reveals that the neural resources required to understand degraded speech extend beyond traditional perisylvian language networks, most commonly including regions of prefrontal cortex, premotor cortex, and the cingulo-opercular network. Far from being exclusively an auditory problem, acoustic degradation presents listeners with a systems-level challenge that requires the allocation of executive cognitive resources. An important point is that a number of dissociable processes can be engaged to understand degraded speech, including verbal working memory and attention-based performance monitoring. The specific resources required likely differ as a function of the acoustic, linguistic, and cognitive demands of the task, as well as individual differences in listeners' abilities. A greater appreciation of cognitive contributions to processing degraded speech is critical in understanding individual differences in comprehension ability, variability in the efficacy of assistive devices, and guiding rehabilitation approaches to reducing listening effort and facilitating communication.
The conditions of everyday life are such that people often hear speech that has been degraded (e.g., by background noise or electronic transmission) or when they are distracted by other tasks. However, it remains unclear what role attention plays in processing speech that is difficult to understand. In the current study, we used functional magnetic resonance imaging to assess the degree to which spoken sentences were processed under distraction, and whether this depended on the acoustic quality (intelligibility) of the speech. On every trial, adult human participants attended to one of three simultaneously presented stimuli: a sentence (at one of four acoustic clarity levels), an auditory distracter, or a visual distracter. A postscan recognition test showed that clear speech was processed even when not attended, but that attention greatly enhanced the processing of degraded speech. Furthermore, speech-sensitive cortex could be parcellated according to how speech-evoked responses were modulated by attention. Responses in auditory cortex and areas along the superior temporal sulcus (STS) took the same form regardless of attention, although responses to distorted speech in portions of both posterior and anterior STS were enhanced under directed attention. In contrast, frontal regions, including left inferior frontal gyrus, were only engaged when listeners were attending to speech and these regions exhibited elevated responses to degraded, compared with clear, speech. We suggest this response is a neural marker of effortful listening. Together, our results suggest that attention enhances the processing of degraded speech by engaging higher-order mechanisms that modulate perceptual auditory processing.
A striking feature of human perception is that our subjective experience depends not only on sensory information from the environment but also on our prior knowledge or expectations. The precise mechanisms by which sensory information and prior knowledge are integrated remain unclear, with longstanding disagreement concerning whether integration is strictly feedforward or whether higherlevel knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception. We manipulated listeners' prior knowledge of speech content by presenting matching, mismatching, or neutral written text before a degraded (noise-vocoded) spoken word. When speech conformed to prior knowledge, subjective perceptual clarity was enhanced. This enhancement in clarity was associated with a spatiotemporal profile of brain activity uniquely consistent with a feedback process: activity in the inferior frontal gyrus was modulated by prior knowledge before activity in lower-level sensory regions of the superior temporal gyrus. In parallel, we parametrically varied the level of speech degradation, and therefore the amount of sensory detail, so that changes in neural responses attributable to sensory information and prior knowledge could be directly compared. Although sensory detail and prior knowledge both enhanced speech clarity, they had an opposite influence on the evoked response in the superior temporal gyrus. We argue that these data are best explained within the framework of predictive coding in which sensory activity is compared with top-down predictions and only unexplained activity propagated through the cortical hierarchy.
Hearing loss is one of the most common complaints in adults over the age of 60 and a major contributor to difficulties in speech comprehension. To examine the effects of hearing ability on the neural processes supporting spoken language processing in humans, we used functional magnetic resonance imaging (fMRI) to monitor brain activity while older adults with age-normal hearing listened to sentences that varied in their linguistic demands. Individual differences in hearing ability predicted the degree of language-driven neural recruitment during auditory sentence comprehension in bilateral superior temporal gyri (including primary auditory cortex), thalamus, and brainstem. In a second experiment we examined the relationship of hearing ability to cortical structural integrity using voxel-based morphometry (VBM), demonstrating a significant linear relationship between hearing ability and gray matter volume in primary auditory cortex. Together, these results suggest that even moderate declines in peripheral auditory acuity lead to a systematic downregulation of neural activity during the processing of higher-level aspects of speech, and may also contribute to loss of gray matter volume in primary auditory cortex. More generally these findings support a resource-allocation framework in which individual differences in sensory ability help define the degree to which brain regions are recruited in service of a particular task.
Speech comprehension remains largely preserved in older adults despite significant age-related neurophysiological change. However, older adults' performance declines more rapidly than that of young adults when listening conditions are challenging. We investigated the cortical network underlying speech comprehension in healthy aging using short sentences differing in syntactic complexity, with processing demands further manipulated through speech rate. Neural activity was monitored using blood oxygen level-dependent functional magnetic resonance imaging. Comprehension of syntactically complex sentences activated components of a core sentence-processing network in both young and older adults, including the left inferior and middle frontal gyri, left inferior parietal cortex, and left middle temporal gyrus. However, older adults showed reduced recruitment of inferior frontal regions relative to young adults; the individual degree of recruitment predicted accuracy at the more difficult fast speech rate. Older adults also showed increased activity in frontal regions outside the core sentence-processing network, which may have played a compensatory role. Finally, a functional connectivity analysis demonstrated reduced coherence between activated regions in older adults. We conclude that decreased activation of specialized processing regions, and limited ability to coordinate activity between regions, contribute to older adults' difficulty with sentence comprehension under difficult listening conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.