A broad range of complex physical and biological systems exhibits scaling laws. The human language is a complex system of words organization. Studies of written texts have revealed intriguing scaling laws that characterize the frequency of words occurrence, rank of words, and growth in the number of distinct words with text length. While studies have predominantly focused on the language system in its written form, such as books, little attention is given to the structure of spoken language. Here we investigate a database of spoken language transcripts and written texts, and we uncover that words organization in both spoken language and written texts exhibits scaling laws, although with different crossover regimes and scaling exponents. We propose a model that provides insight into words organization in spoken language and written texts, and successfully accounts for all scaling laws empirically observed in both language forms.
Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical organization, structure and dynamics of the Chinese language remain not well understood. Here we study a database of Chinese and English language books. We report that three distinct scaling laws characterize words organization in the Chinese language. We find that these scaling laws have different exponents and crossover behaviors compared to English texts, indicating different words organization and dynamics of words in the process of text growth. We propose a stochastic feedback model of words organization and text growth, which successfully accounts for the empirically observed scaling laws with their corresponding scaling exponents and characteristic crossover regimes. Further, by varying key model parameters, we reproduce differences in the organization and scaling laws of words between the Chinese and English language. We also identify functional relationships between model parameters and the empirically observed scaling exponents, thus providing new insights into the words organization and growth dynamics in the Chinese and English language.
Human language, as a typical complex system, its organization and evolution is an attractive topic for both physical and cultural researchers. In this paper, we present the first exhaustive analysis of the text organization of human speech. Two important results are that: (i) the construction and organization of spoken language can be characterized as Zipf's law and Heaps' law, as observed in written texts; (ii) word frequency vs. rank distribution and the growth of distinct words with the increase of text length shows significant differences between book and speech. In speech word frequency distribution are more concentrated on higher frequency words, and the emergence of new words decreases much rapidly when the content length grows. Based on these observations, a new generalized model is proposed to explain these complex dynamical behaviors and the differences between speech and book.
The output of a healthy physiological system exhibits complex fluctuation. Nonlinear analysis, such as power-law characteristics, shows the potential for detecting changes in the biological complexity of disease and aging. This paper characterized the heart rate variability (HRV) of aging and patients with congestive heart failure (CHF) by three types of distribution: Zipf's law, Heaps' law, and frequency distribution. All data analysis and modeling are based on a constructed sequence, that is, the monotonous increase to monotonous decrease amplitude ratios as derived from heartbeat interval data. The experimental result shows a significant decrease of HRV from healthy young people to healthy elderly to CHF patients. We proposed a model by taking account of the "rich-get-richer" theory in experimental observations, which successfully reproduced three types of distribution characterizing the constructed ratio sequences as obtained from the analysis of measured cardiac data. This work provides insight into the dynamic mechanism of cardiac data underlying the regulation of autonomic nerve.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.