2022
DOI: 10.48550/arxiv.2203.01400
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adaptive Gradient Methods with Local Guarantees

Abstract: Adaptive gradient methods are the method of choice for optimization in machine learning and used to train the largest deep models. In this paper we study the problem of learning a local preconditioner, that can change as the data is changing along the optimization trajectory. We propose an adaptive gradient method that has provable adaptive regret guarantees vs. the best local preconditioner. To derive this guarantee, we prove a new adaptive regret bound in online learning that improves upon previous adaptive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…We employ switching regret for non-convex optimization. More refined analysis may be possible via generalizations such as strongly adaptive or dynamic regret (Daniely et al, 2015;Jun et al, 2017;Zhang et al, 2018;Jacobsen & Cutkosky, 2022;Cutkosky, 2020;Lu et al, 2022;Luo et al, 2022;Zhang et al, 2021;Baby & Wang, 2022;Zhang et al, 2022). Moreover, our analysis assumes perfect tuning of constants (e.g., D, T, K) for simplicity.…”
Section: Discussionmentioning
confidence: 99%
“…We employ switching regret for non-convex optimization. More refined analysis may be possible via generalizations such as strongly adaptive or dynamic regret (Daniely et al, 2015;Jun et al, 2017;Zhang et al, 2018;Jacobsen & Cutkosky, 2022;Cutkosky, 2020;Lu et al, 2022;Luo et al, 2022;Zhang et al, 2021;Baby & Wang, 2022;Zhang et al, 2022). Moreover, our analysis assumes perfect tuning of constants (e.g., D, T, K) for simplicity.…”
Section: Discussionmentioning
confidence: 99%
“…This bound was further improved to O( |I| log T ) by [9] using a coin-betting technique. Recently, [2] achieved a more refined second-order bound Õ( t∈I ∇ t 2 ), and [10] further improved it to Õ(min H 0,T r(H)≤d t∈I ∇ ⊤ t H −1 ∇ t ), which matches the regret of Adagrad [4]. However, these algorithms are all based on the initial exponential-lookback technique of [7], and requires Θ(log T ) experts per round, increasing the computational complexity of the base algorithm in their reduction by this factor.…”
Section: Related Workmentioning
confidence: 99%