“…Due to the negative sign, the criterion prefers pseudo-samples that lead to flatter maxima of the likelihood. In line with recent insights into sharp and flat minima of loss surfaces [Dinh et al, 2017, Li et al, 2018, Andriushchenko and Flammarion, 2022, such a penalty can be expected to improve generalization. The lower the curvature, the more probability mass (area under the likelihood) is expected on…”