Infants in the early stages of word learning have difficulty learning lexical neighbors (i.e., word pairs that differ by a single phoneme), despite the ability to discriminate the same contrast in a purely auditory task. While prior work has focused on top-down explanations for this failure (e.g. task demands, lexical competition), none has examined if bottom-up acoustic-phonetic factors play a role. We hypothesized that lexical neighbor learning could be improved by incorporating greater acoustic variability in the words being learned, as this may buttress still developing phonetic categories, and help infants identify the relevant contrastive dimension. Infants were exposed to pictures accompanied by labels spoken by either a single or multiple speakers. At test, infants in the single-speaker condition failed to recognize the difference between the two words, while infants who heard multiple speakers discriminated between them.
Classic approaches to word learning emphasize the problem of referential ambiguity: in any naming situation the referent of a novel word must be selected from many possible objects, properties, actions, etc. To solve this problem, researchers have posited numerous constraints, and inference strategies, but assume that determining the referent of a novel word is isomorphic to learning. We present an alternative model in which referent selection is an online process that is independent of long-term learning. This two timescale approach creates significant power in the developing system. We illustrate this with a dynamic associative model in which referent selection is simulated as dynamic competition between competing referents, and learning is simulated using associative (Hebbian) learning. This model can account for a range of findings including the delay in expressive vocabulary relative to receptive vocabulary, learning under high degrees of referential ambiguity using cross-situational statistics, accelerating (vocabulary explosion) and decelerating (power-law) learning rates, fast-mapping by mutual exclusivity (and differences in bilinguals), improvements in familiar word recognition with development, and correlations between individual differences in speed of processing and learning. Five theoretical points are illustrated. 1) Word learning does not require specialized processes – general association learning buttressed by dynamic competition can account for much of the literature. 2) The processes of recognizing familiar words are not different than those that support novel words (e.g., fast-mapping). 3) Online competition may allow the network (or child) to leverage information available in the task to augment performance or behavior despite what might be relatively slow learning or poor representations. 4) Even associative learning is more complex than previously thought – a major contributor to performance is the pruning of incorrect associations between words and referents. 5) Finally, the model illustrates that learning and referent selection/word recognition, though logically distinct, can be deeply and subtly related as phenomena like speed of processing and mutual exclusivity may derive in part from the way learning shapes the system. As a whole, this suggests more sophisticated ways of describing the interaction between situation- and developmental-time processes and points to the need for considering such interactions as a primary determinant of development and processing in children.
Thirty years of research has uncovered the broad principles that characterize spoken word processing across listeners. However, there have been few systematic investigations of individual differences. Such an investigation could help refine models of word recognition by indicating which processing parameters are likely to vary, and could also have important implications for work on language impairment. The present study begins to fill this gap by relating individual differences in overall language ability to variation in online word recognition processes. Using the visual world paradigm, we evaluated online spoken word recognition in adolescents who varied in both basic language abilities and non-verbal cognitive abilities. Eye movements to target, cohort and rhyme objects were monitored during spoken word recognition, as an index of lexical activation. Adolescents with poor language skills showed fewer looks to the target and more fixations to the cohort and rhyme competitors. These results were compared to a number of variants of the TRACE model (McClelland & Elman, 1986) that were constructed to test a range of theoretical approaches to language impairment: impairments at sensory and phonological levels; vocabulary size, and generalized slowing. None were strongly supported, and variation in lexical decay offered the best fit. Thus, basic word recognition processes like lexical decay may offer a new way to characterize processing differences in language impairment.
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important is the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context-dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2880 fricative productions (Jongman, Wayland & Wong, 2000) spanning many talker- and vowel-contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values, and manipulated the information in the training set to contrast 1) models based on a small number of invariant cues; 2) models using all cues without compensation, and 3) models in which cues underwent compensation for contextual factors. Compensation was modeled by Computing Cues Relative to Expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners, and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed.
Recent evidence (Maye, Werker & Gerken, 2002) suggests that statistical learning may be an important mechanism for the acquisition of phonetic categories in the infant's native language. We examined the sufficiency of this hypothesis and its implications for development by implementing a statistical learning mechanism in a computational model based on a Mixture of Gaussians (MOG) architecture. Statistical learning alone was found to be insufficient for phonetic category learning-an additional competition mechanism was required in order to successfully learn the categories in the input. When competition was added to the MOG architecture, this class of models successfully accounted for developmental enhancement and loss of sensitivity to phonetic contrasts. Moreover, the MOG with competition model was used to explore a potentially important distributional property of early speech categories --sparseness --in which portions of the space between phonetic categories is unmapped. Sparseness was found in all successful models and quickly emerged during development even when the initial parameters favored continuous representations with no gaps. The implications of these models for phonetic category learning in infants are discussed.Infants face a difficult problem in acquiring their native language because the acoustic/ phonetic variability in the input far exceeds the limited number of distinctive differences that define language-specific phonemes. How do infants attend to the relevant information that distinguishes words? Recent evidence suggests that phonemic categories may be induced, in whole or in part, by a rapid statistical learning mechanism that is sensitive to the distributional properties of phonetic input (Maye, Werker & Gerken, 2002;Maye, Weiss & Aslin, 2008). This evidence suggests that the detailed frequency-of-occurrence of tokens along continuous speech dimensions plays a crucial role in the formation and modification of phonemic categories.The present paper describes a computational model of statistical speech category learning that examines the necessary and sufficient mechanisms needed to account for known empirical data from infants, and the implications of those mechanisms for early speech categories. We demonstrate that statistical learning alone is insufficient: competition is also required. However, once this feature is added to the model, it can account for a number of developmental trajectories in speech category learning. Finally, we examine the possibility Statistical Learning and DevelopmentThe classic view of speech perception in both adults (cf., Liberman, Harris, Hoffman & Griffith, 1957) and infants (cf., Eimas, Siqueland, Jusczyk & Vigorito, 1971; see Jusczyk, 1997) is that stop consonants are perceived categorically. However, more recent evidence confirms within-category sensitivity in both adults (Pisoni & Tash, 1974; Carney, Widen & Viemeister, 1979;Miller, 1997) and infants (Miller & Eimas, 1996;McMurray & Aslin, 2005). Nevertheless, adults and infants have a bias t...
It is well attested that 14-month olds have difficulty learning similar sounding words (e.g. bih/dih), despite their excellent phonetic discrimination abilities. In contrast, Rost and McMurray (2009) recently demonstrated that 14-month olds’ minimal pair learning can be improved by the presentation of words by multiple talkers. This study investigates which components of the variability found in multi-talker input improved infants’ processing, assessing both the phonologically contrastive aspects of the speech stream and phonologically irrelevant indexical and suprasegmental aspects. In the first two experiments, speaker was held constant while cues to word-initial voicing were systematically manipulated. Infants failed in both cases. The third experiment introduced variability in speaker, but voicing cues were invariant within each category. Infants in this condition learned the words. We conclude that aspects of the speech signal that have been typically thought of as noise are in fact valuable information – signal – for the young word learner.
During speech perception, listeners make judgments about the phonological category of sounds by taking advantage of multiple acoustic cues for each phonological contrast. Perceptual experiments have shown that listeners weight these cues differently. How do listeners weight and combine acoustic cues to arrive at an overall estimate of the category for a speech sound? Here, we present several simulations using a mixture of Gaussians models that learn cue weights and combine cues on the basis of their distributional statistics. We show that a cue-weighting metric in which cues receive weight as a function of their reliability at distinguishing phonological categories provides a good fit to the perceptual data obtained from human listeners, but only when these weights emerge through the dynamics of learning. These results suggest that cue weights can be readily extracted from the speech signal through unsupervised learning processes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.