This paper introduces the notion of frequent frames, distributional patterns based on co-occurrence patterns of words in sentences, then investigates the usefulness of this information in grammatical categorization. A frame is defined as two jointly occurring words with one word intervening. Qualitative and quantitative results from distributional analyses of six different corpora of child directed speech are presented in two experiments. In the analyses, words that were surrounded by the same frequent frame were categorized together. The results show that frequent frames yield very accurate categories. Furthermore, evidence from behavioral studies suggests that infants and adults are sensitive to frame-like units, and that adults use them to categorize words. This evidence, along with the success of frames in categorizing words, provides support for frames as a basis for the acquisition of grammatical categories.
We present a series of three analyses of young children's linguistic input to determine the distributional information it could plausibly offer to the process of grammatical category learning. Each analysis was conducted on four separate corpora from the CHILDES database (MacWhinney, 2000) of speech directed to children under 2;5. We show that, in accord with other findings, a distributional analysis which categorizes words based on their co-occurrence patterns with surrounding words successfully categorizes the majority of nouns and verbs. In Analyses 2 and 3, we attempt to make our analyses more closely relevant to natural language acquisition by adopting more realistic assumptions about how young children represent their input. In Analysis 2, we limit the distributional context by imposing phrase structure boundaries, and find that categorization improves even beyond that obtained from less limited contexts. In Analysis 3, we reduce the representation of input elements which young children might not fully process and we find that categorization is not adversely affected: Although noun categorization is worse than in Analyses 1 and 2, it is still good; and verb categorization actually improves. Overall, successful categorization of nouns and verbs is maintained across all analyses. These results provide promising support for theories of grammatical category formation involving distributional analysis, as long as these analyses are combined with appropriate assumptions about the child learner's computational biases and capabilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.