“…Recently, a growing body work has sought to understand how these language models (LM) fit the distribution of a language beyond standard measures such as perplexity. Meister & Cotterell (2021), for example, investigated the statistical tendencies of the distribution defined by neural LMs, whereas Kulikov et al (2021) explored whether they adequately capture the modes of the distribution they attempt to model. At the same time, increased focus has been given to performance on rare or novel events in the data distribution, both for models of natural language (McCoy et al, 2021;Lent et al, 2021;Dudy & Bedrick, 2020;Oren et al, 2019) and neural models more generally (see, for example Sagawa et al, 2020;D'souza et al, 2021;Blevins & Zettlemoyer, 2020;Czarnowska et al, 2019;Horn & Perona, 2017;Ouyang et al, 2016;Bengio, 2015;Zhu et al, 2014).…”