2017
DOI: 10.1103/physreve.96.022318
|View full text |Cite
|
Sign up to set email alerts
|

Dependence of exponents on text length versus finite-size scaling for word-frequency distributions

Abstract: Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid. Here we give solid quantitative evidence for the validity of this scaling law, using both careful statistical tests and analytical arguments based on the generalized central-limit theorem applied to the moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product). We also find that the picture of word-frequency distribution… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7

Relationship

5
2

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 41 publications
(113 reference statements)
0
11
0
Order By: Relevance
“…On the other side, the theory of scaling analysis, following the authors of [ 21 , 29 ], allows us to compare the shape of the conditional distributions for different values of ℓ . This theory has revealed a very powerful tool in quantitative linguistics, allowing in previous research to show that the shape of the word-frequency distribution does not change as a text increases its length [ 30 , 31 ].…”
Section: Corpus and Statistical Methodsmentioning
confidence: 99%
“…On the other side, the theory of scaling analysis, following the authors of [ 21 , 29 ], allows us to compare the shape of the conditional distributions for different values of ℓ . This theory has revealed a very powerful tool in quantitative linguistics, allowing in previous research to show that the shape of the word-frequency distribution does not change as a text increases its length [ 30 , 31 ].…”
Section: Corpus and Statistical Methodsmentioning
confidence: 99%
“…We can use a variation of the logarithmic-coefficient-ofvariation test (in fact, its original linear form, essentially) to rule out that the distribution of cluster size has an exponential tail, as was claimed in other contexts [43] (and already criticized in Refs. [44,45]). If we compute the usual residual In all cases the first crossing below the fifth percentile takes place for n cv around 100, which corresponds to n PL in Table I.…”
Section: B Residual Logarithmic Coefficient Of Variationmentioning
confidence: 99%
“…Another source of variation in the value of the resulting system size L tot is that this arises as a sum of independent power-law distributed sizes n. As the exponent of the power law γ is smaller than 2, the law of large numbers does not apply and the sum is not scaling linearly with the number of terms (types) V tot . Instead, the sum is broadly distributed, as expected from the generalized central limit theorem [64][65][66]. Table IV provides the results obtained from other examples simulating Zipf's law for sizes [Eq.…”
Section: Simulation Of Zipf's Law With Size As the Random Variablementioning
confidence: 86%