2016
DOI: 10.1209/0295-5075/113/18002
|View full text |Cite
|
Sign up to set email alerts
|

Scaling laws and model of words organization in spoken and written language

Abstract: A broad range of complex physical and biological systems exhibits scaling laws. The human language is a complex system of words organization. Studies of written texts have revealed intriguing scaling laws that characterize the frequency of words occurrence, rank of words, and growth in the number of distinct words with text length. While studies have predominantly focused on the language system in its written form, such as books, little attention is given to the structure of spoken language. Here we investigat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

5
20
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(25 citation statements)
references
References 19 publications
5
20
0
Order By: Relevance
“…Note that, in Fig 5 we also include the results for English spoken language we analyzed in a previous study [30]. The data points obtained from English spoken language fall nicely on the fitting curves, which further validate the functional relation we establish in this study between the empirically observed scaling exponents and the model parameters.…”
Section: Methods and Resultssupporting
confidence: 79%
See 1 more Smart Citation
“…Note that, in Fig 5 we also include the results for English spoken language we analyzed in a previous study [30]. The data points obtained from English spoken language fall nicely on the fitting curves, which further validate the functional relation we establish in this study between the empirically observed scaling exponents and the model parameters.…”
Section: Methods and Resultssupporting
confidence: 79%
“…Data points are obtained from the scaling analyses and simulation of all ten Chinese and English language books listed in Table 1, and English spoken language from Ref. [30]. The dotted lines indicate 95% confidence intervals of the data points obtained from empirical and model parameters for each separate book.…”
Section: Methods and Resultsmentioning
confidence: 99%
“…Next, languages were labeled in classes according to the linguistic family to which they belong (Romance, Germanic, Slavic, Uralic). The eight dimensional vectors comprising the eight ApEn (pattern lengths [3][4][5][6][7][8][9][10] values are used to create the two-dimensional projection. We observe that the families are segregated.…”
Section: Resultsmentioning
confidence: 99%
“…Two representative findings of universal features of natural language are the Zipf and Heaps laws, which are based on word frequency and number of different words, respectively [1][2][3][4]. From a more basic perspective, human language can also be considered as a sequence of symbols which contains information encoded in the patterns (words) needed to communicate.…”
Section: Introductionmentioning
confidence: 99%
“…Rank frequency distributions are found in contemporary natural language corpora and Swadesh lists [19][20][21], comparisons across multiple languages [22][23][24][25], in both written and spoken language data [26], across all English literary texts included in Project Gutenberg [27], and historic language data that is not yet translated [28], but, importantly, are not found in random monkey-typing corpora [14,29]. Rank frequency research has expanded beyond a narrow focus on adult, monolingual, native speakers to demonstrate distinct rank frequency distributions for corpora of varying levels of L2 proficiency across users of natural language [30,31] and artificial command languages [32], L1 attritors who have lost proficiency in their L1 over their lifespan [31], different language combinations of spontaneous codeswitching [33], and in languages with varying proportions of non-native speakers [34].…”
Section: Introductionmentioning
confidence: 99%