We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This “cooling pattern” forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature.
Earthquakes are a complex spatiotemporal phenomenon, the underlying mechanism for which is still not fully understood despite decades of research and analysis. We propose and develop a network approach to earthquake events. In this network, a node represents a spatial location while a link between two nodes represents similar activity patterns in the two different locations. The strength of a link is proportional to the strength of the cross correlation in activities of two nodes joined by the link. We apply our network approach to a Japanese earthquake catalog spanning the 14-year period 1985-1998. We find strong links representing large correlations between patterns in locations separated by more than 1000 kilometers, corroborating prior observations that earthquake interactions have no characteristic length scale. We find network characteristics not attributable to chance alone, including a large number of network links, high node assortativity, and strong stability over time.
We analyze the dynamic properties of 10 7 words recorded in English, Spanish and Hebrew over the period 1800-2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes in the dynamics of language during periods of war shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors. We quantify cultural memory by analyzing the long-term correlations in the use of individual words using detrended fluctuation analysis. Statistical laws describing the properties of word use, such as Zipf 's law [1][2][3][4][5][6] and Heaps' law 7,8 , have been thoroughly tested and modeled. These statistical laws are based on static snapshots of written language using empirical data aggregated over relatively small time periods and comprised of relatively small corpora ranging in size from individual texts 1,2 to relatively small collections of topical texts 3,4 . However, language is a fundamentally dynamic complex system, consisting of heterogenous entities at the level of the units (words) and the interacting users (us). Hence, we begin this paper with two questions: (i) Do languages exhibit dynamical patterns? (ii) Do individual words exhibit dynamical patterns?The coevolutionary nature of language requires analysis both at the macro and micro scale. Here we apply interdisciplinary concepts to empirical language data collected in a massive book digitization effort by Google Inc., which recently unveiled a database of words in seven languages, after having scanned approximately 4% of the world's books. The massive ''n-gram'' project 9 allows for a novel view into the growth dynamics of word use and the birth and death processes of words in accordance with evolutionary selection laws 10 . A recent analysis of this database by Michel et al.11 addresses numerous well-posed questions rooted in cultural anthropology using case studies of individual words. Here we take an alternative approach by analyzing the aggregate properties of the language dynamics recorded in the Google Inc. data in a systematic way, using the word counts of every word recorded over the 209-year time period 1800 -2008 in the English, Spanish, and Hebrew text corpora. This period spans the incredibly rich cultural history that includes several international wars, revolutions, and numerous technological paradigm shifts. Together, the data comprise over 1 3 10 7 distinct words. We use concepts from economics to gain quantitative insights into the role of exogenous factors on the evolutio...
We analyze the dynamic properties of 107 words recorded in English, Spanish and Hebrew over the period 1800–2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes in the dynamics of language during periods of war shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors. We quantify cultural memory by analyzing the long-term correlations in the use of individual words using detrended fluctuation analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.