2017
DOI: 10.20944/preprints201704.0180.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Entropy of Words—Learnability and Expressivity across More Than 1000 Languages

Abstract: Abstract:The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics, and language sciences more generally. Information-theory gives us tools at hand to measure precisely the average amount of choice associated with words -the word entropy. Here we use three parallel corpora -encompassing ca. 450 million words in 1916 texts and 1259 languages -to tackle some of the major conceptual and practical problems of word en… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
6
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 46 publications
(89 reference statements)
0
6
0
Order By: Relevance
“…If users must convey an idea only infrequently, then they can invest effort in longer words to ensure that the idea is communicated clearly [14]. Evidence supporting ZLA has been found in each of the nearly 1,000 human languages where it has been sought [15], and the law applies to both spoken language [16, 17] and written characters [18, 19]. Researchers have reported mixed support for ZLA in the vocal communication of other animals [20], including primates [17, 2125], cetaceans [26], bats [27], and hyraxes [28].…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…If users must convey an idea only infrequently, then they can invest effort in longer words to ensure that the idea is communicated clearly [14]. Evidence supporting ZLA has been found in each of the nearly 1,000 human languages where it has been sought [15], and the law applies to both spoken language [16, 17] and written characters [18, 19]. Researchers have reported mixed support for ZLA in the vocal communication of other animals [20], including primates [17, 2125], cetaceans [26], bats [27], and hyraxes [28].…”
Section: Introductionmentioning
confidence: 99%
“…First, relative to the number of words in human languages, the number of note types used by most bird populations is small. A small number of note types makes it more difficult to detect a significant concordance between note type frequency and duration [15, 17]. If the number of note types is very small, as in the calls of African penguins [34], then even a perfect concordance between the frequency and duration of note types may provide only weak evidence for or against ZLA.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations