Optimal Complexity and Certification of Bregman First-Order Methods

Dragomir, Radu-Alexandru; Taylor, Adrien; d'Aspremont, Alexandre; Bolte, Jérôme

doi:10.48550/arxiv.1911.08510

Cited by 7 publications

(11 citation statements)

References 32 publications

(96 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Firstly, in terms of the dependence on β and µ our algorithm achieves the lower bound (13) obtained in [31]. Secondly, we compare our bound with the bound ( 14) of the DISCO algorithm [32], which unlike other works [23], [24], [25], [26], [27], [28], [29], [30] also achieves the lower bound in terms of the dependence on β and µ. Since the dependence on these parameters in (14) and in our bound (34) are the same, we compare the other parts of the complexity bound.…”

Section: Achieving the Lower Bound For Finite-sum Optimization Under ...mentioning

confidence: 64%

“…This idea have been recently extensively exploited for optimization problems (mainly) over master/workers architectures, under the name of statistical preconditioning [23], [24], [25], [26], [27], [28], [29], [30]. These papers focus on solving the finite-sum problem (7) and most of them do not achieve the lower communication complexity bound for this setting obtained in [31]…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Accelerated Second-Order Method for Distributed Stochastic Optimization

Artem¹,

Dvurechensky²,

Scutari³

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubicregularized Newton's method that achieves lower communication complexity bound for this setting and improves upon existing upper bound. We further exploit this algorithm to obtain convergence rate bounds for the original stochastic optimization problem and compare our bounds with the existing bounds in several regimes when the goal is to minimize the number of communication rounds and increase the parallelization by increasing the number of workers.

show abstract

Section: Achieving the Lower Bound For Finite-sum Optimization Under ...mentioning

confidence: 64%

Section: Introductionmentioning

confidence: 99%

An Accelerated Second-Order Method for Distributed Stochastic Optimization

Artem¹,

Dvurechensky²,

Scutari³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…An obvious one is incorporating acceleration to improve communication complexity bounds. The results of (Hendrikx et al, 2020a) and (Dragomir et al, 2019) imply that the performance of first-order methods is limited even for centralized architectures. Secondly, other ERM algorithms should be evaluated through the lens of main target being statistical precision.…”

Section: Discussionmentioning

confidence: 99%

Newton Method over Networks is Fast up to the Statistical Precision

Daneshmand¹,

Scutari²,

Dvurechensky³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph. The algorithm employs an inexact, preconditioned Newton step at each agent's side: the gradient of the centralized loss is iteratively estimated via a gradient-tracking consensus mechanism and the Hessian is subsampled over the local data sets. No Hessian matrices are thus exchanged over the network. We derive global complexity bounds for convex and strongly convex losses. Our analysis reveals an interesting interplay between sample and iteration/communication complexity: statistically accurate solutions are achievable in roughly the same number of iterations of the centralized cubic Newton method, with a communication cost per iteration of the order of O 1/ √ 1 − ρ , where ρ characterizes the connectivity of the network. This demonstrates a significant communication saving with respect to that of existing, statistically oblivious, distributed Newton-based methods over networks.

show abstract

“…A direct acceleration of the mirror method, achieving O(β/µ) over star-networks [Lu et al, 2020], does not seem possible in general [Dragomir et al, 2019].…”

Section: Contributionsmentioning

confidence: 99%

“…For quadratic losses, DANE achieves communication complexity O((β/µ) 2 log 1/ε). More recently, [Fan et al, 2019] proposed CEASE, which achieves DANE's complexity but for nonquadratic losses and r = 0. Applying the convergence analysis of mirror descent in [Lu et al, 2020] to CEASE enhances its rate to O((β/µ) log 1/ε).…”

Section: Related Workmentioning

confidence: 99%

Acceleration in Distributed Optimization under Similarity

Scutari¹,

Cao²,

Gasnikov³

2021

Preprint

View full text Add to dashboard Cite

We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be similar, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a preconditioned, accelerated distributed method. An(1−ρ) log 1/ε number of communications steps, where β/µ is the relative condition number between the global and local loss functions, and ρ characterizes the connectivity of the network. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest. Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems.

show abstract

Optimal Complexity and Certification of Bregman First-Order Methods

Cited by 7 publications

References 32 publications

An Accelerated Second-Order Method for Distributed Stochastic Optimization

An Accelerated Second-Order Method for Distributed Stochastic Optimization

Newton Method over Networks is Fast up to the Statistical Precision

Acceleration in Distributed Optimization under Similarity

Contact Info

Product

Resources

About