2006
DOI: 10.1073/pnas.0510673103
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical structures induce long-range dynamical correlations in written texts

Abstract: Thoughts and ideas are multidimensional and often concurrent, yet they can be expressed surprisingly well sequentially by the translation into language. This reduction of dimensions occurs naturally but requires memory and necessitates the existence of correlations, e.g., in written text. However, correlations in word appearance decay quickly, while previous observations of long-range correlations using random walk approaches yield little insight on memory or on semantic context. Instead, we study combinations… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
84
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 71 publications
(87 citation statements)
references
References 18 publications
2
84
0
Order By: Relevance
“…Although language sequences are not produced by a stochastic source, it is generally assumed that a large collection of language samples represent an ensemble with enough consistency in its statistical structure to allow the application of the standard formalism of information theory. However, one serious hurdle in computing the entropy of language based on the estimation of block probabilities is the presence of long-range correlations that span from hundreds to thousands of words [20][21][22][23][24]. The sample size that would be needed to estimate the required probabilities grows exponentially with block length, thus quickly rendering insufficient any any available linguistic source.…”
Section: Universality In the Entropy Of Word Orderingmentioning
confidence: 99%
“…Although language sequences are not produced by a stochastic source, it is generally assumed that a large collection of language samples represent an ensemble with enough consistency in its statistical structure to allow the application of the standard formalism of information theory. However, one serious hurdle in computing the entropy of language based on the estimation of block probabilities is the presence of long-range correlations that span from hundreds to thousands of words [20][21][22][23][24]. The sample size that would be needed to estimate the required probabilities grows exponentially with block length, thus quickly rendering insufficient any any available linguistic source.…”
Section: Universality In the Entropy Of Word Orderingmentioning
confidence: 99%
“…Numerous studies demonstrate LRD for example in geological and climate research (Scheffer et al, 2009;Varotsos and Kirk-Davidoff, 2006), finance market fluctuations (Matteo et al, 2005;Robinson, 2003), in Internet modeling and network traffic analysis (Abry et al, 2002;Karagiannis et al, 2004;Riedi et al, 1999), or statistic analyses of human language (Alvarez-Lacalle et al, 2006;Petersen et al, 2012). Fig.…”
Section: Background-long-range Dependency In Complex Systemsmentioning
confidence: 99%
“…In order to model turbulent flows, fractional Brownian motion (fBm), a generalization of the more well-known Brownian motion, was introduced many decades ago [41,42], and has become one of the most studied stochastic processes widely used in a variety of fields, including physics, probability, statistics, hydrology, economy, biology, and many others [43][44][45][46][47][48][49]. A fBm is a self-similar Gaussian process with stationary increments (called fractional Gaussian noise-fGn) and possesses long-range linear correlation which depends on a parameter, called the Hurst exponent, H [50], where 0 < H < 1.…”
Section: Introductionmentioning
confidence: 99%