Discovering power laws in computer programs

Zhang, Hongyu

doi:10.1016/j.ipm.2009.02.001

Cited by 13 publications

(9 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The occurrences of tags in online resources1920, keywords in scientific publications21 and words in web pages resulted from web searching22 also simultaneously display the Zipf's and Heaps' laws. Interestingly, even the identifiers in programs by Java, C++ and C languages exhibit the same scaling laws23.…”

mentioning

confidence: 94%

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Lü

Zhang

Zhou

2013

Sci Rep

View full text Add to dashboard Cite

Zipf's law on word frequency and Heaps' law on the growth of distinct words are observed in Indo-European language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The character frequency distribution follows a power law with exponent close to one, at which the corresponding Zipf's exponent diverges. Indeed, the character frequency decays exponentially in the Zipf's plot. (ii) The number of distinct characters grows with the text length in three stages: It grows linearly in the beginning, then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size. Experiments, simulations and analytical solutions agree well with each other. This work refines the understanding about Zipf's and Heaps' laws in human language systems.

show abstract

mentioning

confidence: 94%

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Lü

Zhang

Zhou

2013

Sci Rep

View full text Add to dashboard Cite

show abstract

“…2)-4) As such, Zipf's law has attracted a great deal of attention, but the general mechanism of Zipf's law remains unclear. In quantitative linguistics, Zipf's exponent, α, is evaluated for various languages, such as English and Russian, 5) and for programming languages, such as Java and C, 6) and is known in many cases to be approximately 1. Heaps' law 7) states that the number of distinct words increases nonlinearly as the total number of words in a document increases.…”

Section: §1 Introductionmentioning

confidence: 99%

Zipf's Law and Heaps' Law Can Predict the Size of Potential Words

Sano

Takayasu

2012

Prog. Theor. Phys. Suppl.

View full text Add to dashboard Cite

“…For all component systems whose realizations have this temporal ordering of components, it is possible to evaluate over a single realization how the number of different components h grows with the realization size m. More in general, the same scaling can be analyzed for component systems even without a natural ordering of components (for example for genomes as composed by genes or for LEGO toys) if the sizes M of the available realizations span a sufficiently large range. As discussed in the Introduction, in several empirical systems this quantity follows a sublinear and approximately power-law function h(m) ∝ m ν (with ν < 1), known as Heaps's law [5,17,[28][29][30][31][32]. Each run of the SSR process also generates an ordered sequence of components (or visited states), and the question is what is the predicted scaling of h(m) for this stochastic process.…”

Section: Resultsmentioning

confidence: 99%

“…This law describes the sublinear growth of the number of different components (i.e. the observed vocabulary) with the system size (i.e., the total number of components), and has been observed in several empirical systems from linguistics to genomics [5,[28][29][30][31][32]. In models based on equilibrium ensembles, such as the random-group-formation model [22], the vocabulary is typically a fixed parameter, thus this scaling cannot be addressed straightforwardly.…”

Section: Introductionmentioning

confidence: 99%

Heaps' law, statistics of shared components, and temporal patterns from a sample-space-reducing process

et al. 2018

View full text Add to dashboard Cite

Zipf's law is a hallmark of several complex systems with a modular structure, such as books composed by words or genomes composed by genes. In these component systems, Zipf's law describes the empirical power law distribution of component frequencies. Stochastic processes based on a sample-space-reducing (SSR) mechanism, in which the number of accessible states reduces as the system evolves, have been recently proposed as a simple explanation for the ubiquitous emergence of this law. However, many complex component systems are characterized by other statistical patterns beyond Zipf's law, such as a sublinear growth of the component vocabulary with the system size, known as Heap's law, and a specific statistics of shared components. This work shows, with analytical calculations and simulations, that these statistical properties can emerge jointly from a SSR mechanism, thus making it an appropriate parameter-poor representation for component systems. Several alternative (and equally simple) models, for example based on the preferential attachment mechanism, can also reproduce Heaps' and Zipf's laws, suggesting that additional statistical properties should be taken into account to select the most-likely generative process for a specific system. Along this line, we will show that the temporal component distribution predicted by the SSR model is markedly different from the one emerging from the popular rich-gets-richer mechanism. A comparison with empirical data from natural language indicates that the SSR process can be chosen as a better candidate model for text generation based on this statistical property. Finally, a limitation of the SSR model in reproducing the empirical "burstiness" of word appearances in texts will be pointed out, thus indicating a possible direction for extensions of the basic SSR process.

show abstract

Discovering power laws in computer programs

Cited by 13 publications

References 38 publications

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Deviation of Zipf's and Heaps' Laws in Human Languages with Limited Dictionary Sizes

Zipf's Law and Heaps' Law Can Predict the Size of Potential Words

Heaps' law, statistics of shared components, and temporal patterns from a sample-space-reducing process

Contact Info

Product

Resources

About