Recently, large-scale pre-trained neural network models such as BERT have achieved many state-of-the-art results in natural language processing. Recent work has explored the linguistic capacities of these models. However, no work has focused on the ability of these models to generalize these capacities to novel words. This type of generalization is exhibited by humans (Berko, 1958), and is intimately related to morphology-humans are in many cases able to identify inflections of novel words in the appropriate context. This type of morphological capacity has not been previously tested in BERT models, and is important for morphologically-rich languages, which are under-studied in the literature regarding BERT's linguistic capacities. In this work, we investigate this by considering monolingual and multilingual BERT models' abilities to agree in number with novel plural words in English, French, German, Spanish, and Dutch. We find that many models are not able to reliably determine plurality of novel words, suggesting potential deficiencies in the morphological capacities of BERT models.
Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features.1 We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language’s morphology on language modeling.
Language evolution is driven by pressures for simplicity and informativity; however, the timescale on which these pressures operate is debated. Over several generations, learners’ biases for simple and informative systems can guide language evolution. Over repeated instances of dyadic communication, the principle of least effort dictates that speakers should bias systems towards simplicity and listeners towards informativity, similarly guiding language evolution. At the same time, it has been argued that learners only provide a bias for simplicity and, thus, language users must provide a bias for informativity. To what extent do languages evolve during acquisition versus use? We address this question by formally defining and investigating the communicative efficiency of acquisition trajectories. We illustrate our approach using colour-naming systems, replicating the communicative efficiency model of Zaslavsky, Kemp, Regier & Tishby (2018, PNAS) and the acquisition model of Beekhuizen & Stevenson (2018, Cogn. Sci.). We find that to the extent that language is iconic, learning alone is sufficient to shape language evolution. Regarding colour-naming systems specifically, we find that incorporating learning biases into communicative efficiency accounts might explain how speakers and listeners trade off communicative effort.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.