2021
DOI: 10.48550/arxiv.2103.12692
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou,
Jingfeng Wu,
Vladimir Braverman
et al.

Abstract: There is an increasing realization that algorithmic inductive biases are central in preventing overfitting; empirically, we often see a benign overfitting phenomenon in overparameterized settings for natural learning algorithms, such as stochastic gradient descent (SGD), where little to no explicit regularization has been employed. This work considers this issue in arguably the most basic setting: constant-stepsize SGD (with iterate averaging) for linear regression in the overparameterized regime. Our main res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(15 citation statements)
references
References 14 publications
0
15
0
Order By: Relevance
“…We refer to Section 4.2 for more distribution details. In light of these bounds, Ours outperforms Bartlett et al (2020) in all the cases, and outperforms Zou et al (2021) in Constant / Piecewise Constant cases if ε < 1 2 and q < min{2 − r, 3 2 }.…”
Section: Examplesmentioning
confidence: 92%
See 3 more Smart Citations
“…We refer to Section 4.2 for more distribution details. In light of these bounds, Ours outperforms Bartlett et al (2020) in all the cases, and outperforms Zou et al (2021) in Constant / Piecewise Constant cases if ε < 1 2 and q < min{2 − r, 3 2 }.…”
Section: Examplesmentioning
confidence: 92%
“…Benign Overfitting focuses on deriving non-asymptotic generalization guarantees for overparameterized linear models (Bartlett et al, 2020), which relies on a strict assumption of the feature covariance matrix. Some recent papers focus on deriving benign overfitting under different regimes, e.g., constant-stepsize SGD (Zou et al, 2021), ridge regression (Tsigler and Bartlett, 2020), Random Features (Li et al, 2020b), Gaussian Mixture models (Wang and Thrampoulidis, 2021). This paper relaxes the requirement on the feature covariance matrix by introducing time-variant bounds.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Bias-variance decomposition is widely used in machine learning analysis, e.g., adversarial training [51], double descent [1], uncertainty [25]. This paper considers a slightly different bias-variance decomposition following the analysis of SGD [14,27,57], where high bias means that the model cannot fit the noise data perfectly and high variance means that the model.…”
Section: Related Workmentioning
confidence: 99%