The importance of statistical patterns of language has been debated over decades. Although Zipf 's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) noncoding DNA dominates genomes. Here mathematical, statistical, and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of noncoding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between noncoding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law.2012 Wiley Periodicals, Inc. Complexity 18: [11][12][13][14][15][16][17] 2013 Key Words: statistical laws; language; genomes; music; non-coding DNA; Menzerath's law
INTRODUCTIONA ttempts to demonstrate that statistical patterns of language have a trivial explanation have a long history that goes back at least to the research by G. A. Miller and collaborators questioning the relevance of Zipf's law for word frequencies around 1960 [1-3]. Zipf's law states that the curve that relates the frequency of a word f and its rank r (the most frequent word having rank 1, the second most frequent word having rank 2, and so on) should follow f $ r 2a [4]. Miller argued that if monkeys were chained ''to typewriters until they had produced some very long and random sequence of characters'' one would find ''exactly the same 'Zipf curves' for the monkeys as for the human authors '' [3]. Under his view, Zipf's law would be an inevitable consequence of the fact that words are made of units, e.g., letters or phonemes. The typewriter argument has been revived many times since then [5][6][7][8]. However, rigorous analyses indicate that the curves do not really look the same and the parameters of this random typing model giving a good fit to real word frequencies are not forthcoming [9,10] claim that the finding of another statistical pattern of language, Menzerath's law, is also inevitable [11]. P. Menzerath hypothesized that ''the greater the whole, the smaller its constituents'' (''Je größer das Ganze, desto kleiner die Teile'') in the context of language [12] (pp. 101). Converging research in music and genomes [13][14][15][16] suggests that Menzerath's law is a general law of natural and humanmade systems. In this article, we leave the term Menzerath-Altmann law for referring to the exact mathematical dependency that has been proposed by the quantitative linguistics traditi...