Words which are expected to contain the same surface string of segments may, under identical prosodic circumstances, sometimes be realized with slight differences in duration. Some researchers have attributed such effects to differences in the words' underlying forms (incomplete neutralization), while others have suggested orthographic influence and extremely careful speech as the cause. In this paper, we demonstrate such sub-phonemic durational differences in Dutch, a language which some past research has found not to have such effects. Past literature has also shown that listeners can often make use of incomplete neutralization to distinguish apparent homophones. We extend perceptual investigations of this topic, and show that listeners can perceive even durational differences which are not consistently observed in production. We further show that a difference which is primarily orthographic rather than underlying can also create such durational differences. We conclude that a wide variety of factors, in addition to underlying form, can induce speakers to produce slight durational differences which listeners can also use in perception.
The current work examines native Korean speakers’ perception and production of stop contrasts in their native language (L1, Korean) and second language (L2, English), focusing on three acoustic dimensions that are all used, albeit to different extents, in both languages: voice onset time (VOT), f0 at vowel onset, and closure duration. Participants used all three cues to distinguish the L1 Korean three-way stop distinction in both production and perception. Speakers’ productions of the L2 English contrasts were reliably distinguished using both VOT and f0 (even though f0 is only a very weak cue to the English contrast), and, to a lesser extent, closure duration. In contrast to the relative homogeneity of the L2 productions, group patterns on a forced-choice perception task were less clear-cut, due to considerable individual differences in perceptual categorization strategies, with listeners using either primarily VOT duration, primarily f0, or both dimensions equally to distinguish the L2 English contrast. Differences in perception, which were stable across experimental sessions, were not predicted by individual variation in production patterns. This work suggests that reliance on multiple cues in representation of a phonetic contrast can form the basis for distinct individual cue-weighting strategies in phonetic categorization.
Variability is perhaps the most notable characteristic of speech, and it is particularly noticeable in spontaneous conversational speech. The current research examines how speakers realize the American English stops /p, k, b, g/ and flaps (ɾ from /t, d/), in casual conversation and in careful speech. Target consonants appear after stressed syllables (e.g., "lobby") or between unstressed syllables (e.g., "humanity"), in one of six segmental/word-boundary environments. This work documents the degree and types of variability listeners encounter and must parse. Findings show greater reduction in connected and spontaneous speech, greater reduction in high frequency phrases (but not within high frequency words), and greater reduction between unstressed syllables than after a stress. Although highly reduced productions of stops and flaps occur often, with approximant-like tokens even in careful speech, reduction does not lead to a large amount of overlap between phonological categories. Approximant-like realizations of expected stops and flaps in some conditions constitute the majority of tokens. This shows that reduced speech is something that listeners encounter, and must perceive, in a large proportion of the speech they hear.
We present the results of a large-scale study on speech perception, assessing the number and type of perceptual hypotheses which listeners entertain about possible phoneme sequences in their language. Dutch listeners were asked to identify gated fragments of all 1179 diphones of Dutch, providing a total of 488 520 phoneme categorizations. The results manifest orderly uptake of acoustic information in the signal. Differences across phonemes in the rate at which fully correct recognition was achieved arose as a result of whether or not potential confusions could occur with other phonemes of the language ͑long with short vowels, affricates with their initial components, etc.͒. These data can be used to improve models of how acoustic-phonetic information is mapped onto the mental lexicon during speech comprehension.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.