The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.
The ideal of scientific progress is that we accumulate measurements and integrate these into theory, but recent discussion of replicability issues has cast doubt on whether psychological research conforms to this model. Developmental research—especially with infant participants—also has discipline‐specific replicability challenges, including small samples and limited measurement methods. Inspired by collaborative replication efforts in cognitive and social psychology, we describe a proposal for assessing and promoting replicability in infancy research: large‐scale, multi‐laboratory replication efforts aiming for a more precise understanding of key developmental phenomena. The ManyBabies project, our instantiation of this proposal, will not only help us estimate how robust and replicable these phenomena are, but also gain new theoretical insights into how they vary across ages, linguistic communities, and measurement methods. This project has the potential for a variety of positive outcomes, including less‐biased estimates of theoretically important effects, estimates of variability that can be used for later study planning, and a series of best‐practices blueprints for future infancy research.
Why do children learn some words earlier than others? The order in which words are acquired can provide clues about the mechanisms of word learning. In a large-scale corpus analysis, we use parent-report data from over 32,000 children to estimate the acquisition trajectories of around 400 words in each of 10 languages, predicting them on the basis of independently derived properties of the words’ linguistic environment (from corpora) and meaning (from adult judgments). We examine the consistency and variability of these predictors across languages, by lexical category, and over development. The patterning of predictors across languages is quite similar, suggesting similar processes in operation. In contrast, the patterning of predictors across different lexical categories is distinct, in line with theories that posit different factors at play in the acquisition of content words and function words. By leveraging data at a significantly larger scale than previous work, our analyses identify candidate generalizations about the processes underlying word learning across languages.
A data-driven exploration of children's early language learning across different languages, providing an empirical reference and a new theoretical framework. This book examines variability and consistency in children's language learning across different languages and cultures, drawing on Wordbank, an open database with data from more than 75,000 children and twenty-nine languages or dialects. This big data approach makes the book the most comprehensive cross-linguistic analysis to date of early language learning. Moreover, its data-driven picture of which aspects of language learning are consistent across languages suggests constraints on the nature of children's language learning mechanisms. The book provides both a theoretical framework for scholars of language learning, language, and human cognition, and a resource for future research. Wordbank archives data from parents' reports about their children's language learning using instruments in the MacArthur-Bates Communicative Development Inventory (CDI); its goal is to make CDI data available for study and analysis. After an overview of practical and theoretical issues, each of the book's empirical chapters applies a particular analysis to the Wordbank dataset, considering such topics as vocabulary size, demographic variation, syntactic and semantic categories, and the relationship between vocabulary growth and grammar. The final three chapters draw on the preceding chapters to quantify variability and consistency, consider the bird's eye view of language acquisition afforded by the data, and reflect on methodology.
Word-object co-occurrence statistics are a powerful information source for vocabulary learning, but there is considerable debate about how learners actually use them. While some theories hold that learners accumulate graded, statistical evidence about multiple referents for each word, others suggest that they track only a single candidate referent. In two large-scale experiments, we show that neither account is sufficient: Cross-situational learning involves elements of both. Further, the empirical data are captured by a computational model that formalizes how memory and attention interact with co-occurrence tracking. Together, the data and model unify opposing positions in a complex debate and underscore the value of understanding the interaction between computational and algorithmic levels of explanation.
A key question in early word learning is how children cope with the uncertainty in natural naming events. One potential mechanism for uncertainty reduction is cross-situational word learning – tracking word/object co-occurrence statistics across naming events. But empirical and computational analyses of cross-situational learning have made strong assumptions about the nature of naming event ambiguity, assumptions that have been challenged by recent analyses of natural naming events. This paper shows that learning from ambiguous natural naming events depends on perspective. Natural naming events from parent–child interactions were recorded from both a third-person tripod-mounted camera and from a head-mounted camera that produced a ‘child’s-eye’ view. Following the human simulation paradigm, adults were asked to learn artificial language labels by integrating across the most ambiguous of these naming events. Significant learning was found only from the child’s perspective, pointing to the importance of considering statistical learning from an embodied perspective.
A critical question about the nature of human learning is whether it is an all-or-none or a gradual, accumulative process. Associative and statistical theories of word learning rely critically on the later assumption: that the process of learning a word's meaning unfolds over time. That is, learning the correct referent for a word involves the accumulation of partial knowledge across multiple instances. Some theories also make an even stronger claim: Partial knowledge of one word–object mapping can speed up the acquisition of other word–object mappings. We present three experiments that test and verify these claims by exposing learners to two consecutive blocks of cross-situational learning, in which half of the words and objects in the second block were those that participants failed to learn in Block 1. In line with an accumulative account, Re-exposure to these mis-mapped items accelerated the acquisition of both previously experienced mappings and wholly new word–object mappings. But how does partial knowledge of some words speed the acquisition of others? We consider two hypotheses. First, partial knowledge of a word could reduce the amount of information required for it to reach threshold, and the supra-threshold mapping could subsequently aid in the acquisition of new mappings. Alternatively, partial knowledge of a word's meaning could be useful for disambiguating the meanings of other words even before the threshold of learning is reached. We construct and compare computational models embodying each of these hypotheses and show that the latter provides a better explanation of the empirical data.
Cross-situational word learning, like any statistical learning problem, involves tracking the regularities in the environment. But the information that learners pick up from these regularities is dependent on their learning mechanism. This paper investigates the role of one type of mechanism in statistical word learning: competition. Competitive mechanisms would allow learners to find the signal in noisy input, and would help to explain the speed with which learners succeed in statistical learning tasks. Because cross-situational word learning provides information at multiple scales – both within and across trials/situations –learners could implement competition at either or both of these scales. A series of four experiments demonstrate that cross-situational learning involves competition at both levels of scale, and that these mechanisms interact to support rapid learning. The impact of both of these mechanisms is then considered from the perspective of a process-level understanding of cross-situational learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.