2016
DOI: 10.1371/journal.pone.0168971
|View full text |Cite
|
Sign up to set email alerts
|

Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems

Abstract: Scaling laws characterize diverse complex systems in a broad range of fields, including physics, biology, finance, and social science. The human language is another example of a complex system of words organization. Studies on written texts have shown that scaling laws characterize the occurrence frequency of words, words rank, and the growth of distinct words with increasing text length. However, these studies have mainly concentrated on the western linguistic systems, and the laws that govern the lexical org… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 26 publications
0
13
0
Order By: Relevance
“…The law is a special case of the scale-free distribution that it explains, which pervades the rich-get-richer behavior of connections in many biological networks, including those describing metabolism and protein-protein interactions (Barabási, 2009 ). The Zipf law is followed by structural domains at fold and FSF levels (Qian et al, 2001 ; Caetano-Anolles and Caetano-Anollés, 2003 ), with γ decay values of ~2 for Bacteria and Archaea and ~1.4 for Eukarya (Caetano-Anolles and Caetano-Anollés, 2003 ) matching values for the English and Chinese languages, respectively (Li et al, 2016 ). Domain structure is also subject to functional type laws that link two kinds of variables.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The law is a special case of the scale-free distribution that it explains, which pervades the rich-get-richer behavior of connections in many biological networks, including those describing metabolism and protein-protein interactions (Barabási, 2009 ). The Zipf law is followed by structural domains at fold and FSF levels (Qian et al, 2001 ; Caetano-Anolles and Caetano-Anollés, 2003 ), with γ decay values of ~2 for Bacteria and Archaea and ~1.4 for Eukarya (Caetano-Anolles and Caetano-Anollés, 2003 ) matching values for the English and Chinese languages, respectively (Li et al, 2016 ). Domain structure is also subject to functional type laws that link two kinds of variables.…”
Section: Resultsmentioning
confidence: 99%
“…Recent studies of languages with limited dictionary sizes such as Chinese, Japanese, and Korean (Petersen et al, 2012 ; Lü et al, 2013 ) have shown multi-regime Heaps laws. A recent study shows Chinese text follows a 3-regime Heaps law with β scaling exponents of 1, 0.7, and 0.3 for increasing text lengths, which is explained by a stochastic feedback model of vocabulary growth driven by two probabilities, one for the reuse of frequently used words and the other for the rise of word novelties (Li et al, 2016 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Figure 2B shows scatter log-log plots describing the relationship between the vocabulary of protein domains defined at fold superfamily (FSF) level of structural abstraction and the database of FSFs in proteomes for all 1,995 FSFs or only for the 442 FSFs that are common to all superkingdoms and viruses. The plots reveal a four-regime Heaps' law of vocabulary growth describing a decreasing marginal need for new words and an evolutionary slowdown (cooling) that is similar to that of vocabularies for Indo-European, Chinese, Japanese, and Korean languages (Petersen et al, 2012;Lü et al, 2013;Li et al, 2016). The four individual regimes of allometric scaling corresponded to the proteomes of viruses, Archaea, Bacteria, and Eukarya, in that order (Figure 2B), showing increasing slowdown of vocabulary growth with β scaling exponents decreasing from 0.81 to 0.12-0.26.…”
Section: Language Laws Are Constrained By the Engineering Of Biological Systemsmentioning
confidence: 97%
“…This interplay materialized in four regimes of allometric scaling reflected in a Heaps law of vocabulary growth. 111 These regimes explained increasing economies of scale in the evolutionary growth and accretion of kernel proteome repertoires, which resembled growth of human languages with limited vocabulary sizes, such as the Korean or Chinese languages (eg, Li et al 118 ). Results reconcile dynamic and static views of frequency distributions of protein domains that are consistent with the axiom of continuity that is cornerstone of evolutionary thinking and ToL reconstruction.…”
Section: Benefits and Emergent Properties Of Phylogenomic Abundancementioning
confidence: 99%