Normal listeners possess the remarkable perceptual ability to select a single speech stream among many competing talkers. However, few studies of selective attention have addressed the unique nature of speech as a temporally extended and complex auditory object. We hypothesized that sustained selective attention to speech in a multitalker environment would act as gain control on the early auditory cortical representations of speech. Using high-density electroencephalography and a template-matching analysis method, we found selective gain to the continuous speech content of an attended talker, greatest at a frequency of 4 -8 Hz, in auditory cortex. In addition, the difference in alpha power (8 -12 Hz) at parietal sites across hemispheres indicated the direction of auditory attention to speech, as has been previously found in visual tasks. The strength of this hemispheric alpha lateralization, in turn, predicted an individual's attentional gain of the cortical speech signal. These results support a model of spatial speech stream segregation, mediated by a supramodal attention mechanism, enabling selection of the attended representation in auditory cortex.
An increasing number of researchers are exploring variations of the Concealed Knowledge Test (CKT) as alternatives to traditional 'lie-detector' tests. For example, the response times (RT)-based CKT has been previously shown to accurately detect participants who possess privileged knowledge.Although several studies have reported successful RT-based tests, they have focused on verbal stimuli despite the prevalence of photographic evidence in forensic investigations. Related studies comparing pictures and phrases have yielded inconsistent results. The present work compared an RT-CKT using verbal phrases as stimuli to one using pictures of faces. This led to equally accurate and efficient tests using either stimulus type. Results also suggest that previous inconsistent findings may be attributable to study procedures that led to better memory for verbal than visual items. When memory for verbal phrases and pictures were equated, we found nearly identical detection accuracies.
Objective To reduce stimulus transduction artifacts in EEG while using insert earphones. Design Reference Equivalent Threshold SPLs (RETSPLs) were assessed for Etymotic ER-4B earphones in fifteen volunteers. Auditory brainstem responses (ABRs) and middle latency responses (MLRs) – as well as long-duration complex ABRs – to click and /dɑ/ speech stimuli were recorded in a single-case design. Results Transduction artifacts occurred in raw EEG responses, but they were eliminated by shielding, counter-phasing (averaging across stimuli 180° out of phase) or re-referencing. Conclusions Clinical-grade ABRs, MLRs, and cABRs can be recorded with a standard digital EEG system and high-fidelity insert earphones, provided one or more techniques are used to remove the stimulus transduction artifact.
When speech is interrupted by noise, listeners often perceptually “fill-in” the degraded signal, giving an illusion of continuity and improving intelligibility. This phenomenon involves a neural process in which the auditory cortex (AC) response to onsets and offsets of acoustic interruptions is suppressed. Since meaningful visual cues behaviorally enhance this illusory filling-in, we hypothesized that during the illusion, lip movements congruent with acoustic speech should elicit a weaker AC response to interruptions relative to static (no movements) or incongruent visual speech. AC response to interruptions was measured as the power and inter-trial phase consistency of the auditory evoked theta band (4-8 Hz) activity of the electroencephalogram (EEG) and the N1 and P2 auditory evoked potentials (AEPs). A reduction in the N1 and P2 amplitudes and in theta phase-consistency reflected the perceptual illusion at the onset and/or offset of interruptions regardless of visual condition. These results suggest that the brain engages filling-in mechanisms throughout the interruption, which repairs degraded speech lasting up to ~250 ms following the onset of the degradation. Behaviorally, participants perceived greater speech continuity over longer interruptions for congruent compared to incongruent or static audiovisual streams. However, this specific behavioral profile was not mirrored in the neural markers of interest. We conclude that lip-reading enhances illusory perception of degraded speech not by altering the quality of the AC response, but by delaying it during degradations so that longer interruptions can be tolerated.
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived , and a reduced N1 when they perceived, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex. The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
. The neural mechanism that mediates perceptual filling-in of the blind spot is still under discussion. One hypothesis proposes that the cortical representation of the blind spot is activated only under conditions that elicit perceptual filling-in and requires congruent stimulation on both sides of the blind spot. Alternatively, the passive remapping hypothesis proposes that inputs from regions surrounding the blind spot infiltrate the representation of the blind spot in cortex. This theory predicts that independent stimuli presented to the left and right of the blind spot should lead to neighboring/overlapping activations in visual cortex when the blindspot eye is stimulated but separated activations when the fellow eye is stimulated. Using functional MRI, we directly tested the remapping hypothesis by presenting flickering checkerboard wedges to the left or right of the spatial location of the blind spot, either to the blind-spot eye or to the fellow eye. Irrespective of which eye was stimulated, we found separate activations corresponding to the left and right wedges. We identified the centroid of the activations on a cortical flat map and measured the distance between activations. Distance measures of the cortical gap across the blind spot were accurate and reliable (mean distance: 6 -8 mm across subjects, SD ϳ1 mm within subjects). Contrary to the predictions of the remapping hypothesis, cortical distances between activations to the two wedges were equally large for the blind-spot eye and fellow eye in areas V1 and V2/V3. Remapping therefore appears unlikely to account for perceptual filling-in at an early cortical level.
The phase of prestimulus oscillations at 7–10 Hz has been shown to modulate perception of briefly presented visual stimuli. Specifically, a recent combined EEG-fMRI study suggested that a prestimulus oscillation at around 7 Hz represents open and closed windows for perceptual integration by modulating connectivity between lower order occipital and higher order parietal brain regions. We here utilized brief event-related transcranial alternating current stimulation (tACS) to specifically modulate this prestimulus 7 Hz oscillation, and the synchrony between parietal and occipital brain regions. To this end we tested for a causal role of this particular prestimulus oscillation for perceptual integration. The EEG was acquired at the same time allowing us to investigate frequency specific after effects phase-locked to stimulation offset. On a behavioural level our results suggest that tACS did modulate perceptual integration, however, in an unexpected manner. On an electrophysiological level our results suggest that brief tACS does induce oscillatory entrainment, as visible in frequency specific activity phase-locked to stimulation offset. Together, our results do not strongly support a causal role of prestimulus 7 Hz oscillations for perceptual integration. However, our results suggest that brief tACS is capable of modulating oscillatory activity in a temporally sensitive manner.
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker’s mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker’s mouth movements and the speech sounds were in-sync or out-of-sync. Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.