2019
DOI: 10.48550/arxiv.1905.12430
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Norm-based generalisation bounds for multi-class convolutional neural networks

Abstract: We show generalisation error bounds for deep learning with two main improvements over the state of the art. (1) Our bounds have no explicit dependence on the number of classes except for logarithmic factors. This holds even when formulating the bounds in terms of the L 2 -norm of the weight matrices, where previous bounds exhibit at least a square-root dependence on the number of classes. (2) We adapt the classic Rademacher analysis of DNNs to incorporate weight sharing-a task of fundamental theoretical import… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…Other bounds for the misclassification probabilities and prediction errors of risk minimizers can be derived from Bartlett and Mendelson [2002] and Ledent et al [2019]. For example, Bartlett and Mendelson [2002, Theorem 18] entails bounds for ℓ 1 -regularized empirical risk minimization over two-layer neural networks; the bounds are similar to the ones in our Corollary 1 when L = 1.…”
Section: Related Literaturementioning
confidence: 71%
See 2 more Smart Citations
“…Other bounds for the misclassification probabilities and prediction errors of risk minimizers can be derived from Bartlett and Mendelson [2002] and Ledent et al [2019]. For example, Bartlett and Mendelson [2002, Theorem 18] entails bounds for ℓ 1 -regularized empirical risk minimization over two-layer neural networks; the bounds are similar to the ones in our Corollary 1 when L = 1.…”
Section: Related Literaturementioning
confidence: 71%
“…The papers Bartlett [1998], Bartlett and Mendelson [2002], Ledent et al [2019], Neyshabur et al [2015b] derive bounds by using fat-shattering dimensions [Kearns and Schapire, 1994;Anthony and Bartlett, 2009, Section 11.3] or Rademacher/Gaussian complexities [Shalev-Shwartz and Ben-David, 2014, Chapter 26] of sparse or ℓ 1 -related classes of neural networks. Such bounds translate into misclassification bounds or risk bounds for empirical risk minimizers over those classes-see, for example, [Bartlett, 1998, Section 2] and [Bartlett and Mendelson, 2002, Theorem 8], respectively.…”
Section: Related Literaturementioning
confidence: 99%
See 1 more Smart Citation
“…The other line of work on the capacity of deep nets was motivated by [5] which derived capacity estimates for shallow nets in which the 1 norm of all free parameters are bounded. In particular, [6,49,25,4,37] provided size-independent capacity estimates (via different measurements) for deep nets with some strict restrictions to the magnitude of free parameters.…”
Section: 1mentioning
confidence: 99%
“…Since it is difficult to derive perfect bias (or approximation error) for deep nets with required restrictions in [6,49,25,4,37], most of the work on the generalization error of deep learning are based on capacity estimates in [5,27,7]. For example, using the covering number estimates in [5,27], [4] proved that ERM on under-parameterized deep sigmoid nets (deep nets with sigmoid activation functions) can achieve the optimal generalization error bounds for hierarchical interaction models established in [33]; [41] showed that ERM on underparameterized deep sigmoid nets can capture the spatial sparseness of the regression function and achieve the almost optimal generalization error bounds given in [21]; [20] proved that ERM on under-parameterized deep sigmoid nets can realize the rotation-invariance features of the regression function and attain the almost optimal generalization error bounds.…”
Section: 1mentioning
confidence: 99%