Inputs delivered to different sensory organs provide us with complementary speech information about the environment. The goal of this study was to establish which multisensory characteristics can facilitate speech recognition in noise. The major finding is that the tracking of temporal cues of visual/tactile speech synced with auditory speech can play a key role in speech-in-noise performance. This suggests that multisensory interactions are fundamentally important for speech recognition ability in noisy environments, and they require salient temporal cues. The amplitude envelope, serving as a reliable temporal cue source, can be applied through different sensory modalities when speech recognition is compromised.
The inputs delivered to different sensory organs provide us with complementary information about the environment. Our recent study demonstrated that presenting abstract visual information of speech envelopes substantially improves speech perception ability in normal-hearing (NH), listeners [Yuan et al., J. Acoust. Soc. Am. (2020)]. The purpose of this study was to expand this audiovisual speech perception to the tactile domain. Twenty adults participated in sentence recognition threshold measurements in four different sensory modalities (AO: auditory-only; AV: auditory-visual; AT: audio-tactile; and AVT: audio-visual-tactile). The target sentence [CRM speech corpus, Bolia et al., J. Acoust. Soc. Am . (2000)] level was fixed at 60 dBA, and the masker (speech-shaped noise) levels were adaptively varied to find masked thresholds. The amplitudes of both visual and vibrotactile stimuli were temporally synchronized and non-synchronized with the target speech envelope for comparison. Results show that temporally coherent multi-modal stimulation (AV, AT, and AVT) significantly improves speech perception ability when compared to audio-only (AO) stimulation. These multisensory speech perception benefits were reduced when the cross-modal temporal coherence characteristics were eliminated. These findings suggest that multisensory interactions are fundamentally important for speech perception ability in NH listeners. The outcome of this multisensory speech processing highly depends on temporal coherence characteristics between multi-modal sensory inputs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.