2017
DOI: 10.3390/e19100521
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Irregularity of Natural Languages

Abstract: Abstract:In the present work, we quantify the irregularity of different European languages belonging to four linguistic families (Romance, Germanic, Uralic and Slavic) and an artificial language (Esperanto). We modified a well-known method to calculate the approximate and sample entropy of written texts. We find differences in the degree of irregularity between the families and our method, which is based on the search of regularities in a sequence of symbols, and consistently distinguishes between natural and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…While entropy measures have mostly been used to analyse the distributional laws of linguistics, e.g., concerning word order [ 18 , 19 , 20 , 21 ] and word length [ 22 , 23 , 24 , 25 ], or for a comparison of languages in terms of ordering preferences and complexity [ 26 , 27 , 28 , 29 , 30 ], there are also studies that investigated the aesthetic preference and popularity of texts using entropy metrics.…”
Section: Introductionmentioning
confidence: 99%
“…While entropy measures have mostly been used to analyse the distributional laws of linguistics, e.g., concerning word order [ 18 , 19 , 20 , 21 ] and word length [ 22 , 23 , 24 , 25 ], or for a comparison of languages in terms of ordering preferences and complexity [ 26 , 27 , 28 , 29 , 30 ], there are also studies that investigated the aesthetic preference and popularity of texts using entropy metrics.…”
Section: Introductionmentioning
confidence: 99%
“…Next, we calculated several network metrics for the texts from different languages. As we stated above, in our calculations we considered 5 segments with 15,000 elements and we set , which is a value that roughly corresponds to the mean word length in several languages [ 26 , 36 , 37 , 38 ]. The threshold error value is set to .…”
Section: Resultsmentioning
confidence: 99%
“…Our goal is to analyze the recurrence of patterns along the text, by examining the spatio-temporal organization of these patterns from a network science perspective. Recently, we reported the application of methods like such as the approximate entropy to the study of irregularities displayed by some natural languages [ 26 , 27 , 28 ]. In our approach, we consider the similarity between two patterns of length m based on based on the Hamming distance among them.…”
Section: Introductionmentioning
confidence: 99%
“…In van Cranenburgh et al ( 2019 ), summary statistics derived from topic modeling (Latent Dirichlet Allocation) and paragraph vectors are used to predict degrees of “literariness.” Maharjan et al ( 2017 ) explore a wide variety of features (including “readability”) that can be used to classify texts in terms of “likability.” Other standard methods of computational linguistics used in this context include sentiment and emotion analysis (Alm and Sproat, 2005 ; Francisco and Gervás, 2006 ; Kakkonen and Galić Kakkonen, 2011 ; Mohammad, 2011 ; Reagan et al, 2016 ; Maharjan et al, 2018 ). Global statistical properties such as complexity and entropy have been used to study the regularity (Mehri and Lashkari, 2016 ; Hernández-Gómez et al, 2017 ) and the quality of texts (Febres and Jaffe, 2017 ). Fractal analysis, which figures centrally in our study, has been applied to fictional texts as well (Drożdż and Oświȩcimka, 2015 ; Mehri and Lashkari, 2016 ; Chatzigeorgiou et al, 2017 ), and fractal patterns have been observed in both Western (Drożdż et al, 2016 ) and Chinese literature (Yang et al, 2016 ; Chen and Liu, 2018 ).…”
Section: Introductionmentioning
confidence: 99%