2022
DOI: 10.48550/arxiv.2204.12446
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

Abstract: We provide sharp path-dependent generalization and excess error guarantees for the fullbatch Gradient Decent (GD) algorithm for smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is a novel generalization error technique for deterministic symmetric algorithms, that implies average output stability and a bounded expected gradient of the loss at termination leads to generalization. This key result shows that small generalization error occurs at stationary points, and allows … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(13 citation statements)
references
References 18 publications
(33 reference statements)
3
10
0
Order By: Relevance
“…( 5) under regime T = O (n), this is suboptimal in the rate of n, but holds for a more general choice of T . Similar results hold for GD under realizable smooth SCO (Nikolakakis et al, 2022;Schliserman and Koren, 2022) and are summarized in Table 1.…”
Section: Realizable Smooth Scosupporting
confidence: 68%
See 4 more Smart Citations
“…( 5) under regime T = O (n), this is suboptimal in the rate of n, but holds for a more general choice of T . Similar results hold for GD under realizable smooth SCO (Nikolakakis et al, 2022;Schliserman and Koren, 2022) and are summarized in Table 1.…”
Section: Realizable Smooth Scosupporting
confidence: 68%
“…It is widely employed in stochastic optimization literature to improve the convergence rate of SGD and GD under overparameterized or realizable setting (Vaswani et al, 2019). Recent papers (Lei and Ying, 2020;Schliserman and Koren, 2022;Nikolakakis et al, 2022) focused on the generalization bound under realizable smooth SCO also suggest that such an assumption improves the sample complexity upper bounds beyond the rate of O (1/ √ n), which is the well-known result for GD and SGD without the realizable assumption (Hardt et al, 2016).…”
Section: Realizable Smooth Scomentioning
confidence: 86%
See 3 more Smart Citations