1992
DOI: 10.1109/18.165464
|View full text |Cite
|
Sign up to set email alerts
|

Random texts exhibit Zipf's-law-like word frequency distribution

Abstract: It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to its rank, which stretches an exponential function to a power law function.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
315
0
3

Year Published

2005
2005
2015
2015

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 375 publications
(342 citation statements)
references
References 3 publications
3
315
0
3
Order By: Relevance
“…It has been shown that this Zipf-Mandelbrot law is also obeyed by so many random processes [69,70] that one has been sometimes ruling out any interestingly special character for linguistic studies. Nevertheless, it has been argued that it is possible to discriminate between human writings [71] and stochastic versions of texts precisely by looking at statistical properties of words that fall where Eq.…”
Section: Zipf Methodsmentioning
confidence: 99%
“…It has been shown that this Zipf-Mandelbrot law is also obeyed by so many random processes [69,70] that one has been sometimes ruling out any interestingly special character for linguistic studies. Nevertheless, it has been argued that it is possible to discriminate between human writings [71] and stochastic versions of texts precisely by looking at statistical properties of words that fall where Eq.…”
Section: Zipf Methodsmentioning
confidence: 99%
“…This argument was first introduced by Miller [19,20] to explain the power law distribution of words in texts. Here we study another simple way to extract a power-law, in this case from a fluctuating or time dependent Poisson process.…”
mentioning
confidence: 99%
“…By employing linear regression, we estimate approximate value of s to be 1.053, which follows the empirical observations that s → 1 and H ∝ (|Q| + |T |) when applied to human language [11]. We derive a final, concise relationship:…”
Section: Token-based Canopiesmentioning
confidence: 99%