“…This assertion neglects a fundamental property of the frequency distribution of words (Baayen, 2001;Baroni, 2008;Zanette & Montemurro, 2005;Zipf, 1935Zipf, /1965, but also of word sequences (Bannard & Lieven, 2009;Baroni, 2008;Ha, Sicilia-Garcia, Ming, & Smith, 2002), in human languages: its Zipfian nature, which has been observed in each analysed natural language and for all the lengths of texts and corpora from a few thousand words up to several tens of millions. In any text or corpus, 'a few words occur with very high frequency while many words occur but rarely' (Zipf, 1935(Zipf, /1965, and this overrepresentation of rare items is larger for smaller texts and corpora (Baayen, 2001;McEnery & Gabrielatos, 2006;Zeldes, 2013;Zipf, 1935Zipf, /1965. However, when the same normalized frequency threshold is used in corpora of different sizes, this overrepresentation of rare words and rare sequences in the smaller corpora is not taken into account and a disproportionately large number of word sequences is selected from them.…”