2020
DOI: 10.1073/pnas.2001875117
|View full text |Cite
|
Sign up to set email alerts
|

A brief prehistory of double descent

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
30
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 44 publications
(32 citation statements)
references
References 8 publications
2
30
0
Order By: Relevance
“…When the labels have nonzero noise σ 2 > 0 (Fig. 2 d, e), generalization error is non-monotonic with a peak, a feature that has been named “double-descent” 3 , 37 . By decomposing E g into the bias and the variance of the estimator, we see that the non-monotonicity is caused solely by the variance (Fig.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…When the labels have nonzero noise σ 2 > 0 (Fig. 2 d, e), generalization error is non-monotonic with a peak, a feature that has been named “double-descent” 3 , 37 . By decomposing E g into the bias and the variance of the estimator, we see that the non-monotonicity is caused solely by the variance (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Finally, when the data labels are noisy or the target function has components not expressible by the kernel, we observe that generalization error can exhibit non-monotonic behavior as a function of the number of samples, contrary to the common intuition that more data should lead to smaller error. This non-monotonic behavior is reminiscent of the recently described “double-descent” phenomenon 3 , 5 , 37 , 38 , where generalization error is non-monotonic as a function of model complexity in many modern machine learning models. We show that the non-monotonicity can be mitigated by increasing the implicit or explicit regularization.…”
Section: Introductionmentioning
confidence: 84%
“…The``double descent"" risk curve was posited by Belkin et al [4] to connect the classical bias-variance trade-off to behaviors observed in overparameterized regimes for a variety of machine learning models. The shape and features of the risk curve itself appear throughout in the literature in a number of contexts, e.g., [21,17,13,12,6,23,1]; see also [14] for a``brief prehistory"" that focuses on the curious peak in the curve. These prior works analyze the risk of linear classification and regression models and neural networks in high-dimensional asymptotic regimes.…”
mentioning
confidence: 99%
“…Although predictors at the interpolation threshold typically have high risk, further increasing the number of parameters (capacity of H) leads to improved generalization. The double descent pattern has been empirically demonstrated for a broad range of datasets and algorithms, including modern deep neural networks (Belkin et al 2019a, Spigler et al 2019, Nakkiran et al 2020 and observed earlier for linear models (Loog et al 2020). The 'modern' regime of the curve, the phenomenon that large number of parameters often do not lead to over-fitting, has historically been observed in boosting (Schapire et al 1998, Wyner, Olson, Bleich andMease 2017) and random forests, including interpolating random forests (Cutler and Zhao 2001) as well as in neural networks (Breiman 1995, Neyshabur et al 2015.…”
Section: The Double Descent Phenomenonmentioning
confidence: 86%