Norm-based generalisation bounds for multi-class convolutional neural networks

Ledent, Antoine; Mustafa, Waleed; Lei, Yunwen; Kloft, Marius

doi:10.48550/arxiv.1905.12430

Cited by 3 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other bounds for the misclassification probabilities and prediction errors of risk minimizers can be derived from Bartlett and Mendelson [2002] and Ledent et al [2019]. For example, Bartlett and Mendelson [2002, Theorem 18] entails bounds for ℓ 1 -regularized empirical risk minimization over two-layer neural networks; the bounds are similar to the ones in our Corollary 1 when L = 1.…”

Section: Related Literaturementioning

confidence: 71%

“…The papers Bartlett [1998], Bartlett and Mendelson [2002], Ledent et al [2019], Neyshabur et al [2015b] derive bounds by using fat-shattering dimensions [Kearns and Schapire, 1994;Anthony and Bartlett, 2009, Section 11.3] or Rademacher/Gaussian complexities [Shalev-Shwartz and Ben-David, 2014, Chapter 26] of sparse or ℓ 1 -related classes of neural networks. Such bounds translate into misclassification bounds or risk bounds for empirical risk minimizers over those classes-see, for example, [Bartlett, 1998, Section 2] and [Bartlett and Mendelson, 2002, Theorem 8], respectively.…”

Section: Related Literaturementioning

confidence: 99%

“…For example, Bartlett and Mendelson [2002, Theorem 18] entails bounds for ℓ 1 -regularized empirical risk minimization over two-layer neural networks; the bounds are similar to the ones in our Corollary 1 when L = 1. Ledent et al [2019] derive guarantees that cater to classification with many classes.…”

Section: Related Literaturementioning

confidence: 99%

See 2 more Smart Citations

Statistical Guarantees for Regularized Neural Networks

Taheri¹,

Xie²,

Lederer³

2020

Preprint

View full text Add to dashboard Cite

Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. We then exemplify this guarantee with ℓ 1 -regularization, showing that the corresponding prediction error increases at most sub-linearly in the number of layers and at most logarithmically in the total number of parameters. Our results establish a mathematical basis for regularized estimation of neural networks, and they deepen our mathematical understanding of neural networks and deep learning more generally.

show abstract

Section: Related Literaturementioning

confidence: 71%

Section: Related Literaturementioning

confidence: 99%

See 1 more Smart Citation

Statistical Guarantees for Regularized Neural Networks

Taheri¹,

Xie²,

Lederer³

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The other line of work on the capacity of deep nets was motivated by [5] which derived capacity estimates for shallow nets in which the 1 norm of all free parameters are bounded. In particular, [6,49,25,4,37] provided size-independent capacity estimates (via different measurements) for deep nets with some strict restrictions to the magnitude of free parameters.…”

Section: 1mentioning

confidence: 99%

“…Since it is difficult to derive perfect bias (or approximation error) for deep nets with required restrictions in [6,49,25,4,37], most of the work on the generalization error of deep learning are based on capacity estimates in [5,27,7]. For example, using the covering number estimates in [5,27], [4] proved that ERM on under-parameterized deep sigmoid nets (deep nets with sigmoid activation functions) can achieve the optimal generalization error bounds for hierarchical interaction models established in [33]; [41] showed that ERM on underparameterized deep sigmoid nets can capture the spatial sparseness of the regression function and achieve the almost optimal generalization error bounds given in [21]; [20] proved that ERM on under-parameterized deep sigmoid nets can realize the rotation-invariance features of the regression function and attain the almost optimal generalization error bounds.…”

Section: 1mentioning

confidence: 99%

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

Lin¹,

Wang²

2021

Preprint

View full text Add to dashboard Cite

In this paper, we study the generalization performance of global minima for implementing empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving almost optimal generalization error bounds for numerous types of data under mild conditions. Since over-parameterization is crucial to guarantee that the global minima of ERM on deep ReLU nets can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results indeed fill a gap between optimization and generalization.

show abstract

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

Rangamani

Liao

et al. 2023

Research

View full text Add to dashboard Cite

We overview several properties—old and new—of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum ρ , which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is ρ . In particular, we derive novel norm-based bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. The same analysis predicts the existence of an inherent stochastic gradient descent noise for deep networks. In both cases, we verify our predictions experimentally. We then predict neural collapse and its properties without any specific assumption—unlike other published proofs. Our analysis supports the idea that the advantage of deep networks relative to other classifiers is greater for problems that are appropriate for sparse deep architectures such as convolutional neural networks. The reason is that compositionally sparse target functions can be approximated well by “sparse” deep networks without incurring in the curse of dimensionality.

show abstract

Norm-based generalisation bounds for multi-class convolutional neural networks

Cited by 3 publications

References 23 publications

Statistical Guarantees for Regularized Neural Networks

Statistical Guarantees for Regularized Neural Networks

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds

Contact Info

Product

Resources

About