Classic accounts of the benefits of speechreading to speech recognition treat auditory and visual channels as independent sources of information that are integrated fairly early in the speech perception process. The primary question addressed in this study was whether visible movements of the speech articulators could be used to improve the detection of speech in noise, thus demonstrating an influence of speechreading on the ability to detect, rather than recognize, speech. In the first experiment, ten normal-hearing subjects detected the presence of three known spoken sentences in noise under three conditions: auditory-only (A), auditory plus speechreading with a visually matched sentence (AV(M)) and auditory plus speechreading with a visually unmatched sentence (AV(UM). When the speechread sentence matched the target sentence, average detection thresholds improved by about 1.6 dB relative to the auditory condition. However, the amount of threshold reduction varied significantly for the three target sentences (from 0.8 to 2.2 dB). There was no difference in detection thresholds between the AV(UM) condition and the A condition. In a second experiment, the effects of visually matched orthographic stimuli on detection thresholds was examined for the same three target sentences in six subjects who participated in the earlier experiment. When the orthographic stimuli were presented just prior to each trial, average detection thresholds improved by about 0.5 dB relative to the A condition. However, unlike the AV(M) condition, the detection improvement due to orthography was not dependent on the target sentence. Analyses of correlations between area of mouth opening and acoustic envelopes derived from selected spectral regions of each sentence (corresponding to the wide-band speech, and first, second, and third formant regions) suggested that AV(M) threshold reduction may be determined by the degree of auditory-visual temporal coherence, especially between the area of lip opening and the envelope derived from mid- to high-frequency acoustic energy. Taken together, the data (for these sentences at least) suggest that visual cues derived from the dynamic movements of the fact during speech production interact with time-aligned auditory cues to enhance sensitivity in auditory detection. The amount of visual influence depends in part on the degree of correlation between acoustic envelopes and visible movement of the articulators.
Factors leading to variability in auditory-visual (AV) speech recognition include the subject's ability to extract auditory (A) and visual (V) signal-related cues, the integration of A and V cues, and the use of phonological, syntactic, and semantic context. In this study, measures of A, V, and AV recognition of medial consonants in isolated nonsense syllables and of words in sentences were obtained in a group of 29 hearing-impaired subjects. The test materials were presented in a background of speech-shaped noise at 0-dB signal-to-noise ratio. Most subjects achieved substantial AV benefit for both sets of materials relative to A-alone recognition performance. However, there was considerable variability in AV speech recognition both in terms of the overall recognition score achieved and in the amount of audiovisual gain. To account for this variability, consonant confusions were analyzed in terms of phonetic features to determine the degree of redundancy between A and V sources of information. In addition, a measure of integration ability was derived for each subject using recently developed models of AV integration. The results indicated that (1) AV feature reception was determined primarily by visual place cues and auditory voicing + manner cues, (2) the ability to integrate A and V consonant cues varied significantly across subjects, with better integrators achieving more AV benefit, and (3) significant intra-modality correlations were found between consonant measures and sentence measures, with AV consonant scores accounting for approximately 54% of the variability observed for AV sentence recognition. Integration modeling results suggested that speechreading and AV integration training could be useful for some individuals, potentially providing as much as 26% improvement in AV consonant recognition.
For all but the most profoundly hearing-impaired ͑HI͒ individuals, auditory-visual ͑AV͒ speech has been shown consistently to afford more accurate recognition than auditory ͑A͒ or visual ͑V͒ speech. However, the amount of AV benefit achieved ͑i.e., the superiority of AV performance in relation to unimodal performance͒ can differ widely across HI individuals. To begin to explain these individual differences, several factors need to be considered. The most obvious of these are deficient A and V speech recognition skills. However, large differences in individuals' AV recognition scores persist even when unimodal skill levels are taken into account. These remaining differences might be attributable to differing efficiency in the operation of a perceptual process that integrates A and V speech information. There is at present no accepted measure of the putative integration process. In this study, several possible integration measures are compared using both congruent and discrepant AV nonsense syllable and sentence recognition tasks. Correlations were tested among the integration measures, and between each integration measure and independent measures of AV benefit for nonsense syllables and sentences in noise. Integration measures derived from tests using nonsense syllables were significantly correlated with each other; on these measures, HI subjects show generally high levels of integration ability. Integration measures derived from sentence recognition tests were also significantly correlated with each other, but were not significantly correlated with the measures derived from nonsense syllable tests. Similarly, the measures of AV benefit based on nonsense syllable recognition tests were found not to be significantly correlated with the benefit measures based on tests involving sentence materials. Finally, there were significant correlations between AV integration and benefit measures derived from the same class of speech materials, but nonsignificant correlations between integration and benefit measures derived from different classes of materials. These results suggest that the perceptual processes underlying AV benefit and the integration of A and V speech information might not operate in the same way on nonsense syllable and sentence input. ͓S0001-4966͑98͒03510-3͔
These findings encourage further study of attentional and other cognitive factors that accompany speech listening by the hearing impaired.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.