Visual capture and the ventriloquism aftereffect resolve spatial disparities of incongruent auditory-visual (AV) objects by shifting auditory spatial perception to align with vision. Here, we demonstrated the distinct temporal characteristics of visual capture and the ventriloquism aftereffect in response to brief AV disparities. In a set of experiments, subjects localized either the auditory component of AV targets (A within AV) or a second sound presented at varying delays (1-20s) after AV exposure (A2 after AV). AV targets were trains of brief presentations (1 or 20), covering a ±30° azimuthal range, and with ±8° (R or L) disparity. We found that the magnitude of visual capture generally reached its peak within a single AV pair and did not dissipate with time, while the ventriloquism aftereffect accumulated with repetitions of AV pairs and dissipated with time. Additionally, the magnitude of the auditory shift induced by each phenomenon was uncorrelated across listeners and visual capture was unaffected by subsequent auditory targets, indicating that visual capture and the ventriloquism aftereffect are separate mechanisms with distinct effects on auditory spatial perception. Our results indicate that visual capture is a ‘sample-and-hold’ process that binds related objects and stores the combined percept in memory, whereas the ventriloquism aftereffect is a ‘leaky integrator’ process that accumulates with experience and decays with time to compensate for cross-modal disparities.
The ventriloquism aftereffect (VAE) refers to a shift in auditory spatial perception following exposure to a spatial disparity between auditory and visual stimuli. The VAE has been previously measured on two distinct time scales. Hundreds or thousands of exposures to a an audio-visual spatial disparity produces enduring VAE that persists after exposure ceases. Exposure to a single audio-visual spatial disparity produces immediate VAE that decays over seconds. To determine if these phenomena are two extremes of a continuum or represent distinct processes, we conducted an experiment with normal hearing listeners that measured VAE in response to a repeated, constant audio-visual disparity sequence, both immediately after exposure to each audio-visual disparity and after the end of the sequence. In each experimental session, subjects were exposed to sequences of auditory and visual targets that were constantly offset by +8° or −8° in azimuth from one another, then localized auditory targets presented in isolation following each sequence. Eye position was controlled throughout the experiment, to avoid the effects of gaze on auditory localization. In contrast to other studies that did not control eye position, we found both a large shift in auditory perception that decayed rapidly after each AV disparity exposure, along with a gradual shift in auditory perception that grew over time and persisted after exposure to the AV disparity ceased. We modeled the temporal and spatial properties of the measured auditory shifts using grey box nonlinear system identification, and found that two models could explain the data equally well. In the power model, the temporal decay of the ventriloquism aftereffect was modeled with a power law relationship. This causes an initial rapid drop in auditory shift, followed by a long tail which accumulates with repeated exposure to audio-visual disparity. In the double exponential model, two separate processes were required to explain the data, one which accumulated and decayed exponentially and the other which slowly integrated over time. Both models fit the data best when the spatial spread of the ventriloquism aftereffect was limited to a window around the location of the audio-visual disparity. We directly compare the predictions made by each model, and suggest additional measurements that could help distinguish which model best describes the mechanisms underlying the VAE.
Band importance functions estimate the relative contribution of individual acoustic frequency bands to speech intelligibility. Previous studies of band importance in listeners with cochlear implants have used experimental maps and direct stimulation. Here, band importance was estimated for clinical maps with acoustic stimulation. Listeners with cochlear implants had band importance functions that relied more heavily on lower frequencies and showed less cross-listener consistency than in listeners with normal hearing. The intersubject variability observed here indicates that averaging band importance functions across listeners with cochlear implants, as has been done in previous studies, may not be meaningful. Additionally, band importance functions of listeners with normal hearing for vocoded speech that either did or did not simulate spread of excitation were not different from one another, suggesting that additional factors beyond spread of excitation are necessary to account for changes in band importance in listeners with cochlear implants.
Purpose The goal of this study was to determine how various aspects of cognition predict speech recognition ability across different levels of speech vocoding within a single group of listeners. Method We tested the ability of young adults ( N = 32) with normal hearing to recognize Perceptually Robust English Sentence Test Open-set (PRESTO) sentences that were degraded with a vocoder to produce different levels of spectral resolution (16, eight, and four carrier channels). Participants also completed tests of cognition (fluid intelligence, short-term memory, and attention), which were used as predictors of sentence recognition. Sentence recognition was compared across vocoder conditions, predictors were correlated with individual differences in sentence recognition, and the relationships between predictors were characterized. Results PRESTO sentence recognition performance declined with a decreasing number of vocoder channels, with no evident floor or ceiling performance in any condition. Individual ability to recognize PRESTO sentences was consistent relative to the group across vocoder conditions. Short-term memory, as measured with serial recall, was a moderate predictor of sentence recognition (ρ = 0.65). Serial recall performance was constant across vocoder conditions when measured with a digit span task. Fluid intelligence was marginally correlated with serial recall, but not sentence recognition. Attentional measures had no discernible relationship to sentence recognition and a marginal relationship with serial recall. Conclusions Verbal serial recall is a substantial predictor of vocoded sentence recognition, and this predictive relationship is independent of spectral resolution. In populations that show variable speech recognition outcomes, such as listeners with cochlear implants, it should be possible to account for the independent effects of spectral resolution and verbal serial recall in their speech recognition ability. Supplemental Material https://doi.org/10.23641/asha.12021051
Vision typically has better spatial accuracy and precision than audition, and as a result often captures auditory spatial perception when visual and auditory cues are presented together. One determinant of visual capture is the amount of spatial disparity between auditory and visual cues: when disparity is small visual capture is likely to occur, and when disparity is large visual capture is unlikely. Previous experiments have used two methods to probe how visual capture varies with spatial disparity. First, congruence judgment assesses perceived unity between cues by having subjects report whether or not auditory and visual targets came from the same location. Second, auditory localization assesses the graded influence of vision on auditory spatial perception by having subjects point to the remembered location of an auditory target presented with a visual target. Previous research has shown that when both tasks are performed concurrently they produce similar measures of visual capture, but this may not hold when tasks are performed independently. Here, subjects alternated between tasks independently across three sessions. A Bayesian inference model of visual capture was used to estimate perceptual parameters for each session, which were compared across tasks. Results demonstrated that the range of audio-visual disparities over which visual capture was likely to occur were narrower in auditory localization than in congruence judgment, which the model indicates was caused by subjects adjusting their prior expectation that targets originated from the same location in a task-dependent manner.
Objectives: Serial recall of digits is frequently used to measure short-term memory span in various listening conditions. However, the use of digits may mask the effect of low quality auditory input. Digits have high frequency and are phonologically distinct relative to one another, so they should be easy to identify even with low quality auditory input. In contrast, larger item sets reduce listener ability to strategically constrain their expectations, which should reduce identification accuracy and increase the time and/or cognitive resources needed for identification when auditory quality is low. This diminished accuracy and increased cognitive load should interfere with memory for sequences of items drawn from large sets. The goal of this work was to determine whether this predicted interaction between auditory quality and stimulus set in short-term memory exists, and if so, whether this interaction is associated with processing speed, vocabulary, or attention. Design: We compared immediate serial recall within young adults with normal hearing across unprocessed and vocoded listening conditions for multiple stimulus sets. Stimulus sets were lists of digits (1 to 9), consonant-vowel-consonant (CVC) words (chosen from a list of 60 words), and CVC nonwords (chosen from a list of 50 nonwords). Stimuli were unprocessed or vocoded with an eight-channel noise vocoder. To support interpretation of responses, words and nonwords were selected to minimize inclusion of multiple phonemes from within a confusion cluster. We also measured receptive vocabulary (Peabody Picture Vocabulary Test [PPVT-4]), sustained attention (test of variables of attention [TOVA]), and repetition speed for individual items from each stimulus set under both listening conditions. Results: Vocoding stimuli had no impact on serial recall of digits, but reduced memory span for words and nonwords. This reduction in memory span was attributed to an increase in phonological confusions for nonwords. However, memory span for vocoded word lists remained reduced even after accounting for common phonetic confusions, indicating that lexical status played an additional role across listening conditions. Principal components analysis found two components that explained 84% of the variance in memory span across conditions. Component one had similar load across all conditions, indicating that participants had an underlying memory capacity, which was common to all conditions. Component two was loaded by performance in the vocoded word and nonword conditions, representing the sensitivity of memory span to vocoding of these stimuli. The order in which participants completed listening conditions had a small effect on memory span that could not account for the effect of listening condition. Repetition speed was fastest for digits, slower for words, and slowest for nonwords. On average, vocoding slowed repetition speed for all stimuli, but repetition speed was not predictive of individual memory span. Vocabulary and attention showed no correlation with memory span. Conclusions: Our results replicated previous findings that low quality auditory input can impair short-term memory, and demonstrated that this impairment is sensitive to stimulus set. Using multiple stimulus sets in degraded listening conditions can isolate memory capacity (in digit span) from impaired item identification (in word and nonword span), which may help characterize the relationship between memory and speech recognition in difficult listening conditions.
Purpose In individuals with cochlear implants, speech recognition is not associated with tests of working memory that primarily reflect storage, such as forward digit span. In contrast, our previous work found that vocoded speech recognition in individuals with normal hearing was correlated with performance on a forward digit span task. A possible explanation for this difference across groups is that variability in auditory resolution across individuals with cochlear implants could conceal the true relationship between speech and memory tasks. Here, our goal was to determine if performance on forward digit span and speech recognition tasks are correlated in individuals with cochlear implants after controlling for individual differences in auditory resolution. Method We measured sentence recognition ability in 20 individuals with cochlear implants with Perceptually Robust English Sentence Test Open-set sentences. Spectral and temporal modulation detection tasks were used to assess individual differences in auditory resolution, auditory forward digit span was used to assess working memory storage, and self-reported word familiarity was used to assess vocabulary. Results Individual differences in speech recognition were predicted by spectral and temporal resolution. A correlation was found between forward digit span and speech recognition, but this correlation was not significant after controlling for spectral and temporal resolution. No relationship was found between word familiarity and speech recognition. Forward digit span performance was not associated with individual differences in auditory resolution. Conclusions Our findings support the idea that sentence recognition in individuals with cochlear implants is primarily limited by individual differences in working memory processing, not storage. Studies examining the relationship between speech and memory should control for individual differences in auditory resolution.
Acoustics research involving human participants typically takes place in specialized laboratory settings. Listening studies, for example, may present controlled sounds using calibrated transducers in sound-attenuating or anechoic chambers. In contrast, remote testing takes place outside of the laboratory in everyday settings (e.g., participants' homes). Remote testing could provide greater access to participants, larger sample sizes, and opportunities to characterize performance in typical listening environments at the cost of reduced control of environmental conditions, less precise calibration, and inconsistency in attentional state and/or response behaviors from relatively smaller sample sizes and unintuitive experimental tasks. The Acoustical Society of America Technical Committee on Psychological and Physiological Acoustics launched the Task Force on Remote Testing ( https://tcppasa.org/remotetesting/ ) in May 2020 with goals of surveying approaches and platforms available to support remote testing and identifying challenges and considerations for prospective investigators. The results of this task force survey were made available online in the form of a set of Wiki pages and summarized in this report. This report outlines the state-of-the-art of remote testing in auditory-related research as of August 2021, which is based on the Wiki and a literature search of papers published in this area since 2020, and provides three case studies to demonstrate feasibility during practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.