This study proposes a new methodology for determining the relationship between child-directed speech and child speech in early acquisition. It illustrates the use of this methodology in investigating the relationship between the morphological richness of child-directed speech and the speed of morphological development in child speech. Both variables are defined in terms of mean size of paradigm (MSP) and estimated in a set of longitudinal spontaneous speech corpora of nine children and their caretakers. The children are aged 1;3–3;0, acquiring nine different languages that vary in terms of morphological richness. The main result is that the degree of morphological richness in child-directed speech is positively related to the speed of development of noun and verb paradigms in child speech.
This article describes in detail several explicit computational methods for approaching such questions in phonology as the vowel/consonant distinction, the nature of vowel harmony systems, and syllable structure, appealing solely to distributional information. Beginning with the vowel/ consonant distinction, we consider a method for its discovery by the Russian linguist Boris Sukhotin, and compare it to two newer methods of more general interest, both computational and theoretical, today. The first is based on spectral decomposition of matrices, allowing for dimensionality reduction in a finely controlled way, and the second is based on finding parameters for maximum likelihood in a hidden Markov model. While all three methods work for discovering the fairly robust vowel/consonant distinction, we extend the newer ones to the discovery of vowel harmony, and in the case of the probabilistic model, to the discovery of some aspects of syllable structure.
This study introduces a new metric for assessing the inflectional diversity of morphologically analyzed language transcripts. The proposed metric is based on the intuitive notion of mean size of paradigm (MSP) and makes extensive use of random sampling procedures for normalization purposes. This approach is systematically evaluated on the basis of large sets of Dutch acquisition corpora, including both child speech and child-directed speech. It is shown to be an efficient way of controlling for sample size in the measurement of inflectional diversity, as well as a suitable method for assessing inflectional development in longitudinal data. MSP is compared with ID (inflectional diversity) introduced by Malvern, Richards, Chipere, and Durán (2004).
Due to their pictographic nature, emojis come with baked-in, grounded semantics. Although this makes emojis promising candidates for new forms of more accessible communication, it is still unknown to what degree humans agree on the inherent meaning of emojis when encountering them outside of concrete textual contexts. To bridge this gap, we collected a crowdsourced dataset (made publicly available) of one-word descriptions for 1,289 emojis presented to participants with no surrounding text. The emojis and their interpretations were then examined for ambiguity. We find that, with 30 annotations per emoji, 16 emojis (1.2%) are completely unambiguous, whereas 55 emojis (4.3%) are so ambiguous that the variation in their descriptions is as high as that in randomly chosen descriptions. Most emojis lie between these two extremes. Furthermore, investigating the ambiguity of different types of emojis, we find that emojis representing symbols from established, yet not cross-culturally familiar code books (e.g., zodiac signs, Chinese characters) are most ambiguous. We conclude by discussing design implications.
This article reviews research on the unsupervised learning of morphology, that is, the induction of morphological knowledge with no prior knowledge of the language beyond the training texts. This is an area of considerable activity over the period from the mid 1990s to the present. It is of particular interest to linguists because it provides a good example of a domain in which complex structures must be induced by the language learner, and successes in this area have all relied on quantitative models that in various ways focus on model complexity and on goodness of fit to the data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.