“…The papers Bartlett [1998], Bartlett and Mendelson [2002], Ledent et al [2019], Neyshabur et al [2015b] derive bounds by using fat-shattering dimensions [Kearns and Schapire, 1994;Anthony and Bartlett, 2009, Section 11.3] or Rademacher/Gaussian complexities [Shalev-Shwartz and Ben-David, 2014, Chapter 26] of sparse or ℓ 1 -related classes of neural networks. Such bounds translate into misclassification bounds or risk bounds for empirical risk minimizers over those classes-see, for example, [Bartlett, 1998, Section 2] and [Bartlett and Mendelson, 2002, Theorem 8], respectively.…”