2014
DOI: 10.5715/jnlp.21.877
|View full text |Cite
|
Sign up to set email alerts
|

Study on Constants of Natural Language Texts

Abstract: This paper considers different measures that might become constants for any length of a given natural language text. Such measures indicate a potential for studying the complexity of natural language but have previously only been studied using relatively small English texts. In this study, we consider measures for texts in languages other than English, and for large-scale texts. Among the candidate measures, we consider Yule's K, Orlov's Z, and Golcher's VM , each of whose convergence has been previously argue… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…The type-token ratio (TTR) can also be considered a variant of the species diversity equation, and is text size dependent, however, it is a popular metric, and we mitigate any size impacts and avoid issues near the asymptote of word counts by keeping all the samples of each novel of equal size and at 4000 words, well below 10,000 where other techniques perform better; further, we use Richness as part of a larger multivariate technique (Juola & Mikros, 2016;Kimura & Tanaka-Ishii, 2014;Kubát & Milička, 2013;Tanaka-Ishii & Aihara, 2015;Van Gijsel, Speelman, & Geeraerts, 2005;Vermeer, 2000).…”
Section: Richness (R)mentioning
confidence: 99%
“…The type-token ratio (TTR) can also be considered a variant of the species diversity equation, and is text size dependent, however, it is a popular metric, and we mitigate any size impacts and avoid issues near the asymptote of word counts by keeping all the samples of each novel of equal size and at 4000 words, well below 10,000 where other techniques perform better; further, we use Richness as part of a larger multivariate technique (Juola & Mikros, 2016;Kimura & Tanaka-Ishii, 2014;Kubát & Milička, 2013;Tanaka-Ishii & Aihara, 2015;Van Gijsel, Speelman, & Geeraerts, 2005;Vermeer, 2000).…”
Section: Richness (R)mentioning
confidence: 99%