In a spoken utterance, a talker expresses linguistic constituents in serial order. A listener resolves these linguistic properties in the rapidly fading auditory sample. Classic measures agree that auditory integration occurs at a fine temporal grain. In contrast, recent studies have proposed that sensory integration of speech occurs at a coarser grain approximate to the syllable, based on indirect and relatively insensitive perceptual measures. Evidence from cognitive neuroscience and behavioral primatology has also been adduced to support the claim of sensory integration at the pace of syllables. The present investigation uses direct performance measures of integration, applying an acoustic technique to isolate the contribution of short-term acoustic properties to the assay of modulation sensitivity. In corroborating the classic finding of a fine temporal grain of integration, these functional measures can inform theory and speculation in accounts of speech perception.
Linear prediction is a widely available technique for analyzing acoustic properties of speech, although this method is known to be error-prone. New tests assessed the adequacy of linear prediction estimates by using this method to derive synthesis parameters and testing the intelligibility of the synthetic speech that results. Matched sets of sine-wave sentences were created, one set using uncorrected linear prediction estimates of natural sentences, the other using estimates made by hand. Phoneme restrictions imposed on linguistic properties allowed comparisons between continuous and intermittent voicing, oral or nasal and fricative manner, and unrestricted phonemic variation. Intelligibility tests revealed uniformly good performance with sentences created by handestimation and a minimal decrease in intelligibility with estimation by linear prediction due to manner variation with continuous voicing. Poorer performance was observed when linear prediction estimates were used to produce synthetic versions of phonemically unrestricted sentences, but no similar decline was observed with synthetic sentences produced by hand estimation. The results show a substantial intelligibility cost of reliance on uncorrected linear prediction estimates when phonemic variation approaches natural incidence.
Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.