When members of a series of synthesized stop consonants varying acoustically in F3 characteristics and varying perceptually from fdaf to fgaf are preceded by fall, subjects report hearing more fgaf syllables relative to when each member is preceded by farf (Mann, 1980).It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV)stimuli varying in F3-onset frequency (/daf-fgal) were preceded by speech versions or nonspeech analogues of fall and lest. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker's productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of fall and farf. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency of F3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.Despite 40 years of sustained effort to develop machine speech-recognition devices, no engineering approach to speech perception has achieved the success ofan average 2-year-old human. One of the more daunting aspects of speech for these efforts is the acoustic effects of coarticulation. Traditionally, coarticulation refers to the spatial and temporal overlap of adjacent articulatory activities. This is reflected in the acoustic signal by severe context dependence; acoustic information specifying one phoneme varies substantially, depending on surrounding phonemes. As a result, there is a lack ofinvariance between linguistic units (e.g., phonemes, morphemes) and attributes of the acoustic signal. This poses quite a problem for speech-recognition devices which are designed to output strings of phonemes. ' An example of coarticulatory influence is the effect of a preceding liquid on the acoustic realization of a subsequent stop consonant. Mann (1980) reports that articulation of the syllables fdal and Igal may be influenced by the production of a preceding lall or lar/. Articulatorily described, the physical realization of the phonemes Idl and Igl primarily differ in the place at which the tongue occludes the vocal tract. For a velar stop [g], the tongue body is raised against the soft palate at the rear of the mouth, whereas for an alveolar stop [d], the tongue tip comes in contact with the alveolar ridge toward the front ofthe oral cavity behind the teeth. The liquids III and Irl differ in a similar manner; an [r] is produced with the tongue raised toward the rear of the cavity, and an [1] is produce...
The ability to integrate and weight information across dimensions is central to perception and is particularly important for speech categorization. The present experiments investigate cue weighting by training participants to categorize sounds drawn from a two-dimensional acoustic space defined by the center frequency ͑CF͒ and modulation frequency ͑MF͒ of frequency-modulated sine waves. These dimensions were psychophysically matched to be equally discriminable and, in the first experiment, were equally informative for accurate categorization. Nevertheless, listeners' category responses reflected a bias for use of CF. This bias remained even when the informativeness of CF was decreased by shifting distributions to create more overlap in CF. A reversal of weighting ͑MF over CF͒ was obtained when distribution variance was increased for CF. These results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions. Moreover, changes in weighting strategies can be affected by changes in input distribution parameters. This methodology provides potential insights into acquisition of speech sound categories, particularly second language categories. One implication is that ineffective cue weighting strategies for phonetic categories may be alleviated by manipulating variance of uninformative dimensions in training stimuli.
This chapter focuses on one of the first steps in comprehending spoken language: How do listeners extract the most fundamental linguistic elements-consonants and vowels, or the distinctive features which compose them-from the acoustic signal? We begin by describing three major theoretical perspectives on the perception of speech. Then we review several lines of research that are relevant to distinguishing these perspectives. The research topics surveyed include categorical perception, phonetic context effects, learning of speech and related nonspeech categories, and the relation between speech perception and production. Finally, we describe challenges facing each of the major theoretical perspectives on speech perception.
Purpose In this study, the authors examined whether rhythm metrics capable of distinguishing languages with high and low temporal stress contrast also can distinguish among control and dysarthric speakers of American English with perceptually distinct rhythm patterns. Methods Acoustic measures of vocalic and consonantal segment durations were obtained for speech samples from 55 speakers across 5 groups (hypokinetic, hyperkinetic, flaccid-spastic, ataxic dysarthrias, and controls). Segment durations were used to calculate standard and new rhythm metrics. Discriminant function analyses (DFAs) were used to determine which sets of predictor variables (rhythm metrics) best discriminated between groups (control vs. dysarthrias; and among the 4 dysarthrias). A cross-validation method was used to test the robustness of each original DFA. Results The majority of classification functions were more than 80% successful in classifying speakers into their appropriate group. New metrics that combined successive vocalic and consonantal segments emerged as important predictor variables. DFAs pitting each dysarthria group against the combined others resulted in unique constellations of predictor variables that yielded high levels of classification accuracy. Conclusions: This study confirms the ability of rhythm metrics to distinguish control speech from dysarthrias and to discriminate dysarthria subtypes. Rhythm metrics show promise for use as a rational and objective clinical tool.
The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoretical and empirical reasons to temper enthusiasm about the explanatory role mirror neurons might have for speech perception. In fact, rather than providing support for MT, mirror neurons are actually inconsistent with the central tenets of MT. Mirror neurons and the motor theory of speech perceptionOne of the more intriguing and highly cited theories in cognitive science is the Motor Theory of speech perception (MT) proposed by Alvin Liberman and his collaborators [1][2][3]. According to MT, humans perceive speech sounds not as sounds, per se, but as the `intended phonetic gestures of the speaker' [2]. The proposal is that production and perception of speech share the same neural processes and representations, based in a linguistic module evolved specifically for communication. Empirical tests of the predictions of MT have provided mixed support, at best [4] (Box 1), and the number of proponents of MT in the field of speech perception has dwindled. However, the discovery of a class of mirror neurons in monkeys [5,6] and a purported homologous mirror system in humans [7,8] has resulted in a recent renaissance for MT. The discovery of mirror neurons has affected research in the neuroscience of speech and language processing, speech development and language evolution, to name several domains [9][10][11][12][13][14].Mirror neurons are a class of neurons found in premotor cortex of the monkey that respond both when performing an action, such as grasping food, and when seeing someone else (such as a human) perform the same action [5]. Thus, it has been suggested that perception and production of action potentially share a common neural code. The salient parallel between this description of mirror neurons and MT's proposal for a common perception and production code has not escaped attention. It has become common for articles on mirror neurons to reference MT and to suggest that the mirror system in humans could have an important role in speech perception [14][15][16][17]. This link is supported by the fact that the area of monkey cortex where mirror neurons were discovered (F5) might be homologous to Broca's area in humans, which has long been implicated in speech and language [18,19] (see also Ref. [20]). Moreover, a sub-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.