The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in their own lab. In this paper, we remedy this issue by presenting Wordbank, a structured database of CDI data combined with a browsable web interface. Wordbank archives CDI data across languages and labs, providing a resource for researchers interested in early language, as well as a platform for novel analyses. The site allows interactive exploration of patterns of vocabulary growth at the level of both individual children and particular words. We also introduce wordbankr, a software package for connecting to the database directly. Together, these tools extend the abilities of students and researchers to explore quantitative trends in vocabulary development.
Previous work suggests that key factors for replicability, a necessary feature for theory building, include statistical power and appropriate research planning. These factors are examined by analyzing a collection of 12 standardized meta‐analyses on language development between birth and 5 years. With a median effect size of Cohen's d = .45 and typical sample size of 18 participants, most research is underpowered (range = 6%–99%; median = 44%); and calculating power based on seminal publications is not a suitable strategy. Method choice can be improved, as shown in analyses on exclusion rates and effect size as a function of method. The article ends with a discussion on how to increase replicability in both language acquisition studies specifically and developmental research more generally.
Why do children learn some words earlier than others? The order in which words are acquired can provide clues about the mechanisms of word learning. In a large-scale corpus analysis, we use parent-report data from over 32,000 children to estimate the acquisition trajectories of around 400 words in each of 10 languages, predicting them on the basis of independently derived properties of the words’ linguistic environment (from corpora) and meaning (from adult judgments). We examine the consistency and variability of these predictors across languages, by lexical category, and over development. The patterning of predictors across languages is quite similar, suggesting similar processes in operation. In contrast, the patterning of predictors across different lexical categories is distinct, in line with theories that posit different factors at play in the acquisition of content words and function words. By leveraging data at a significantly larger scale than previous work, our analyses identify candidate generalizations about the processes underlying word learning across languages.
A data-driven exploration of children's early language learning across different languages, providing an empirical reference and a new theoretical framework. This book examines variability and consistency in children's language learning across different languages and cultures, drawing on Wordbank, an open database with data from more than 75,000 children and twenty-nine languages or dialects. This big data approach makes the book the most comprehensive cross-linguistic analysis to date of early language learning. Moreover, its data-driven picture of which aspects of language learning are consistent across languages suggests constraints on the nature of children's language learning mechanisms. The book provides both a theoretical framework for scholars of language learning, language, and human cognition, and a resource for future research. Wordbank archives data from parents' reports about their children's language learning using instruments in the MacArthur-Bates Communicative Development Inventory (CDI); its goal is to make CDI data available for study and analysis. After an overview of practical and theoretical issues, each of the book's empirical chapters applies a particular analysis to the Wordbank dataset, considering such topics as vocabulary size, demographic variation, syntactic and semantic categories, and the relationship between vocabulary growth and grammar. The final three chapters draw on the preceding chapters to quantify variability and consistency, consider the bird's eye view of language acquisition afforded by the data, and reflect on methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.