A three-tone sinusoidal replica of a naturally produced utterance was identified by listeners, despite the readily apparent unnatural speech quality of the signal. The time-varying properties of these highly artificial acoustic signals are apparently sufficient to support perception of the linguistic message in the absence of traditional acoustic cues for phonetic segments.
Accounts of the identification of words and talkers commonly rely on different acoustic properties. To identify a word, a perceiver discards acoustic aspects of an utterance that are talker specific, forming an abstract representation of the linguistic message with which to probe a mental lexicon. To identify a talker, a perceiver discards acoustic aspects of an utterance specific to particular phonemes, creating a representation of voice quality with which to search for familiar talkers in long-term memory. In 3 experiments, sinewave replicas of natural speech sampled from 10 talkers eliminated natural voice quality while preserving idiosyncratic phonetic variation. Listeners identified the sinewave talkers without recourse to acoustic attributes of natural voice quality. This finding supports a revised description of speech perception in which the phonetic properties of utterances serve to identify both words and talkers.
How does a perceiver resolve the linguistic properties of an utterance? This question has motivated many investigations within the study of speech perception and a great variety of explanations. In a retrospective summary 15 years ago, Klatt (1989) reviewed a large sample of theoretical descriptions of the perceiver's ability to project the sensory effects of speech, exhibiting inexhaustible variety, into a finite and small number of linguistically defined attributes, whether features, phones, phonemes, syllables, or words. Although he noted many distinctions among the accounts, with few exceptions they exhibited a common feature. Each presumed that perception begins with a speech signal, well-composed and fit to analyze. This common premise shared by otherwise divergent explanations of perception obliges the models to admit severe and unintended constraints on their applicability. To exist within the limits set by this simplifying assumption, the models are restricted to a domain in which speech is the only sound; moreover, only a single talker ever speaks at once. Although this designation is easily met in laboratory samples, it is safe to say that it is rare in vivo. Moreover, in their exclusive devotion to the perception of speech the models are tacitly modular (Fodor, 1983), whether or not they acknowledge it.Despite the consequences of this dedication of perceptual models to speech and speech alone, there has been a plausible and convenient way to persist in invoking the simplifying assumption. This fundamental premise survives intact if a preliminary process of perceptual organization finds a speech signal, follows its patterned variation amid the effects of other sound sources, and delivers it whole and ready to analyze for linguistic properties. The indifference to the conditions imposed by the common perspective reflects an apparent consensus that perceptual organization of speech is simple, automatic, and accomplished by generic means. However, despite the rapidly established perceptual coherence of the constituents of a speech signal, the perceptual organization of speech cannot be reduced to the available and well-established principles of auditory perceptual organization.
In 5 experiments, the authors investigated how listeners learn to recognize unfamiliar talkers and how experience with specific utterances generalizes to novel instances. Listeners were trained over several days to identify 10 talkers from natural, sinewave, or reversed speech sentences. The sinewave signals preserved phonetic and some suprasegmental properties while eliminating natural vocal quality. In contrast, the reversed speech signals preserved vocal quality while distorting temporally based phonetic properties. The training results indicate that listeners learned to identify talkers even from acoustic signals lacking natural vocal quality. Generalization performance varied across the different signals and depended on the salience of phonetic information. The results suggest similarities in the phonetic attributes underlying talker recognition and phonetic perception.When a talker produces an utterance, the listener simultaneously apprehends the linguistic form of the message as well as the nonlinguistic attributes of the talker's unique vocal anatomy and pronunciation habits. Anatomical and stylistic differences in articulation convey an array of personal or indexical qualities, such as personal identity, sex, approximate age, ethnicity, personality, intentions or emotional state, level of alcohol intoxication, and facial expression (see Bricker
How does a perceiver resolve the linguistic properties of an utterance? This question has motivated many investigations within the study of speech perception and a great variety of explanations. In a retrospective summary 15 years ago, Klatt (1989) reviewed a large sample of theoretical descriptions of the perceiver's ability to project the sensory effects of speech, exhibiting inexhaustible variety, into a finite and small number of linguistically defined attributes, whether features, phones, phonemes, syllables, or words. Although he noted many distinctions among the accounts, with few exceptions they exhibited a common feature. Each presumed that perception begins with a speech signal, well-composed and fit to analyze. This common premise shared by otherwise divergent explanations of perception obliges the models to admit severe and unintended constraints on their applicability. To exist within the limits set by this simplifying assumption, the models are restricted to a domain in which speech is the only sound; moreover, only a single talker ever speaks at once. Although this designation is easily met in laboratory samples, it is safe to say that it is rare in vivo. Moreover, in their exclusive devotion to the perception of speech the models are tacitly modular (Fodor, 1983), whether or not they acknowledge it.Despite the consequences of this dedication of perceptual models to speech and speech alone, there has been a plausible and convenient way to persist in invoking the simplifying assumption. This fundamental premise survives intact if a preliminary process of perceptual organization finds a speech signal, follows its patterned variation amid the effects of other sound sources, and delivers it whole and ready to analyze for linguistic properties. The indifference to the conditions imposed by the common perspective reflects an apparent consensus that perceptual organization of speech is simple, automatic, and accomplished by generic means. However, despite the rapidly established perceptual coherence of the constituents of a speech signal, the perceptual organization of speech cannot be reduced to the available and well-established principles of auditory perceptual organization.
Function morphemes or functors (e.g., articles and verb inflections) potentially provide children with cues for segmenting speech into constituents, as well as for labeling these constituents (e.g., noun phrase [NP] and verb phrase [VP]). However, the fact that young children often fail to produce functors may indicate that they ignore these cues in early language acquisition. Alternatively, children may be sensitive to functors in perception, but omit them in production. In 3 experiments, 2year-olds imitated sentences that contained English or non-English functors and that were controlled for both suprasegmental and segmental factors. Children omitted English functors more frequently than non-English functors, indicating perceptual sensitivity to familiar vs. unfamiliar elements. The results suggest that children may be able to use functors early in language acquisition to solve the segmentation and labeling problems.How do children come to treat the incoming speech stream as composed of linguistic units, such as clauses and phrases? How are they able to distinguish among different types of these constituents? We shall refer to these as the segmentation and labeling problems, respectively. Recent discussions have begun to focus on the importance of function morphemes in guiding young children to segment speech and to label grammatical categories (Gleitman & Wanner, 1982;Maratsos, 1982;Morgan, Meier, & Newport, 1987;Valian & Coulson, 1988). However, children learning English and other languages typically omit function morphemes in their spontaneous and imitative speech, suggesting that these cues may not be used in the earliest stages of learning. In this article, we examine the alternative possibility that young children detect and analyze functors even though they omit them in their speech. More specifically, we assess the possibility that functors are analyzed with sufficient detail to support both segmentation and labeling.In English and in other languages, syntactic units such asThis research was carried out by LouAnn Gerken as part of her doctoral dissertation at Columbia University. Some of the results were pre-
Our studies revealed two stable modes of perceptual organization, one based on attributes of auditory sensory elements and another based on attributes of patterned sensory variation composed by the aggregation of sensory elements. In a dual-task method, listeners attended concurrently to both aspects, component and pattern, of a sine wave analogue of a word. Organization of elements was indexed by several single-mode tests of auditory form perception to verify the perceptual segregation of either an individual formant of a synthetic word or a tonal component of a sinusoidal word analogue. Organization of patterned variation was indexed by a test of lexical identification. The results show the independence of the perception of auditory and phonetic form, which appear to be differently organized concurrent effects of the same acoustic cause.
Sine wave replicas of spoken words can be perceived both as nonphonetic auditory forms and as words, depending on a listener's experience. In this study, brain areas activated by sine wave words were studied with fMRI in two conditions: when subjects perceived the sounds spontaneously as nonphonetic auditory forms ("nïve condition") and after instruction and brief practice attending to their phonetic attributes ("informed condition"). The test items were composed such that half replicated natural words ("phonetic items") and the other half did not, because the tone analogs of the first and third formants had been temporally reversed ("nonphonetic items"). Subjects were asked to decide whether an isolated tone analog of the second formant (T2) presented before the sine wave word (T1234) was included in it. Experience in attending to the phonetic properties of the sinusoids interfered with this auditory matching task and was accompanied by a decrease in auditory cortex activation with word replicas but not with the acoustically matched nonphonetic items. Because the activation patterns elicited by equivalent acoustic test items depended on a listener's awareness of their phonetic potential, this indicates that the analysis of speech sounds in the auditory cortex is distinct from the simple resolution of auditory form, and is not a mere consequence of acoustic complexity. Because arbitrary acoustic patterns did not evoke the response observed for phonetic patterns, these findings suggest that the perception of speech is contingent on the presence of familiar patterns of spectral variation. The results are consistent with a short-term functional reorganization of auditory analysis induced by phonetic experience with sine wave replicas and contingent on the dynamic acoustic structure of speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.