Many theories predict the presence of interactive effects involving information represented by distinct cognitive processes in speech production. There is considerably less agreement regarding the precise cognitive mechanisms that underlie these interactive effects. For example, are they driven by purely production-internal mechanisms (e.g., Dell, 1986) or do they reflect the influence of perceptual monitoring mechanisms on production processes (e.g., Roelofs, 2004)? Acoustic analyses reveal the phonetic realization of words is influenced by their word-specific properties-supporting the presence of interaction between lexical-level and phonetic information in speech production. A second experiment examines what mechanisms are responsible for this interactive effect. The results suggest the effect occurs on-line and is not purely driven by listener modeling. These findings are consistent with the presence of an interactive mechanism that is online and internal to the production system.A great deal of research has documented the presence of interactive effects-the interaction of distinct types of information represented by different cognitive processes -within both speech production and perception. The Ganong effect (Ganong, 1980) is a prototypical example; it illustrates the interaction of lexical and phonetic category information in speech perception. When presented with syllables whose initial consonants vary along a voice-onset time (VOT) continuum, listeners' identification of the initial phoneme is sensitive to whether the resulting syllable is a word or nonword. This is an interactive effect because the two types of information are assumed to be encoded by distinct processing stages in speech perception. The lexicality of a sound sequence is represented within lexical level processes, while the structure of phonetic categories is represented by pre-lexical processes (see McClelland, Mirman, and Holt, 2006, for a recent review of architectures based around this perspective; but see Gaskell and Marslen-Wilson, 1997, for an alternative perspective).Within the domain of speech production, many studies have focused on interactive effects involving semantic and phonological information. Most speech production theories assume that at least two distinct stages (representing different types of information) are involved in mapping semantic representations onto abstract, long-term memory representations of word form (e.g., mapping 'flying nocturnal mammal' to "bat"). The first processing stage involves selection of a word to express an intended concept, and the second involves retrieval of sound information corresponding to the selected word from long-term memory (Garrett, 1980). We refer to the former as lexical selection and the latter as lexical phonological processing. Critically, semantic information is represented within lexical selection processes while phonological information is represented within lexical phonetic processes. Chronometric studies, spontaneous speech error analyses, and studies of indiv...
Foreign-accented speech can be difficult to understand but listeners can adapt to novel talkers and accents with appropriate experience. Previous studies have demonstrated talker-independent but accent-dependent learning after training on multiple talkers from a single language background. Here, listeners instead were exposed to talkers from five language backgrounds during training. After training, listeners generalized their learning to novel talkers from language backgrounds both included and not included in the training set. These findings suggest that generalization of foreign-accent adaptation is the result of exposure to systematic variability in accented speech that is similar across talkers from multiple language backgrounds.
This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different L1s), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency. The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared L1. Non-native pairs with different L1s were least efficient. These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background.
The current study examined the neural systems underlying lexically conditioned phonetic variation in spoken word production. Participants were asked to read aloud singly presented words which either had a voiced minimal pair (MP) neighbor (e.g. cape) or lacked a minimal pair (NMP) neighbor (e.g. cake). The voiced neighbor never appeared in the stimulus set. Behavioral results showed longer voice-onset time for MP target words, replicating earlier behavioral results (Baese-Berk & Goldrick, 2009). fMRI results revealed reduced activation for MP words compared to NMP words in a network including the left posterior superior temporal gyrus, the supramarginal gyrus, inferior frontal gyrus, and precentral gyrus. These findings support cascade models of spoken word production and show that neural activation at the lexical level modulates activation in those brain regions involved in lexical selection, phonological planning, and ultimately motor plans for production. The facilitatory effects for words with minimal pair neighbors suggest that competition effects reflect the overlap inherent in the phonological representation of the target word and its minimal pair neighbor.
Speech perception abilities vary substantially across listeners, particularly in adverse conditions including those stemming from environmental degradation (e.g., noise) or from talker-related challenges (e.g., nonnative or disordered speech). This study examined adult listeners' recognition of words in phrases produced by six talkers representing three speech varieties: a nonnative accent (Spanish-accented English), a regional dialect (Irish English), and a disordered variety (ataxic dysarthria). Semantically anomalous phrases from these talkers were presented in a transcription task and intelligibility scores, percent words correct, were compared across the three speech varieties. Three cognitive-linguistic areas-receptive vocabulary, cognitive flexibility, and inhibitory control of attention-were assessed as possible predictors of individual word recognition performance. Intelligibility scores for the Spanish accent were significantly correlated with scores for the Irish English and ataxic dysarthria. Scores for the Irish English and dysarthric speech, in contrast, were not correlated. Furthermore, receptive vocabulary was the only cognitive-linguistic assessment that significantly predicted intelligibility scores. These results suggest that, rather than a global skill of perceiving speech that deviates from native dialect norms, listeners may possess specific abilities to overcome particular types of acoustic-phonetic deviation. Furthermore, vocabulary size offers performance benefits for intelligibility of speech that deviates from one's typical dialect norms.
Spoken language requires individuals to both perceive and produce speech. Because both processes access lexical and sublexical representations, it is commonly assumed that perception and production involve cooperative processes. However, few studies have directly examined the nature of the relationship between the two modalities, particularly how producing speech influences speech perception. In a series of experiments, we examine the counter-intuitive finding that learning perceptual representations can be disrupted by producing tokens during training. We investigate whether this disruption can be alleviated by prior experience with the speech sounds, and whether the cause of the disruption is production of the particular sound being learned, or is a more general conflict between the production system and the system that develops new perceptual representations. Our results paint a more competitive relationship between perception and production than might be assumed and suggest that both demands inherent to production and cognitive demands modulate this relationship.
Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour.
During speech communication, both environmental noise and nonnative accents can create adverse conditions for the listener. Individuals recruit additional cognitive, linguistic, and/or perceptual resources when faced with such challenges. Furthermore, listeners vary in their ability to understand speech in adverse conditions. In the present study, we compared individuals' receptive vocabulary, inhibition, rhythm perception, and working memory with transcription accuracy (i.e., intelligibility scores) for four adverse listening conditions: native speech in speech-shaped noise, native speech with a single-talker masker, nonnative-accented speech in quiet, and nonnative-accented speech in speech-shaped noise. The results showed that intelligibility scores for similar types of adverse listening conditions (i.e., with the same environmental noise or nonnative-accented speech) significantly correlated with one another. Furthermore, receptive vocabulary positively predicted performance globally across adverse listening conditions, and working memory positively predicted performance for the nonnative-accented speech conditions. Taken together, these results indicate that some cognitive resources may be recruited for all adverse listening conditions, while specific additional resources may be engaged when people are faced with certain types of listening challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.