2021
DOI: 10.1017/s0962492921000039
|View full text |Cite
|
Sign up to set email alerts
|

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

Abstract: In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 72 publications
(48 citation statements)
references
References 35 publications
0
48
0
Order By: Relevance
“…Despite being highly complex with the ability to even fit random labels and often trained to interpolate the training data, they achieve state-of-the-art out-of-sample generalization performance across a broad range of domains (Zhang et al, 2021). A partial explanation has been provided by the double-descent phenomenon (Belkin et al, 2019a;Belkin, 2021). Extending the generalization curve beyond the interpolation threshold reveals two regimes: the classical U-curve in the underparameterized regime and a monotonically decreasing curve in the overparameterized regime.…”
Section: Motivation and Related Workmentioning
confidence: 99%
“…Despite being highly complex with the ability to even fit random labels and often trained to interpolate the training data, they achieve state-of-the-art out-of-sample generalization performance across a broad range of domains (Zhang et al, 2021). A partial explanation has been provided by the double-descent phenomenon (Belkin et al, 2019a;Belkin, 2021). Extending the generalization curve beyond the interpolation threshold reveals two regimes: the classical U-curve in the underparameterized regime and a monotonically decreasing curve in the overparameterized regime.…”
Section: Motivation and Related Workmentioning
confidence: 99%
“…Several recent works have investigated the nature of modern Deep Neural Networks (DNNs) past the point of zero training error (Belkin, 2021;Nakkiran et al, 2020;Bartlett et al, 2021;Power et al, 2022). The stage at which the training error reaches zero is called the Interpolation Threshold (IT), since at this point, the learned network function interpolates between training samples.…”
Section: Introductionmentioning
confidence: 99%
“…This property is nowadays called the benign overfitting (BO) phenomenon [4,2] and has been the subject of many recent works in the statistical community. Motivation is to identify situations where benign overfitting holds that is when an estimator with a perfect fit on the training data can still generalize well.…”
Section: Introductionmentioning
confidence: 99%
“…We consider this model as a benchmark model because it is likely not reflecting real world data but it is the one that is expected to be universal in the sense that results obtained in other more realistic statistical model could be compared with or tend to the one obtained in this ideal benchmark Gaussian model. The relevance of the approximation of large neural networks by linear models via the neural tangent kernel [19,18] feature map in some regimes has been discussed a lot in the machine learning community for instance in [4,28,1] and references therein.…”
Section: Introductionmentioning
confidence: 99%