DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Li, Boyue; Li, Zhize; Chi, Yuejie

doi:10.1137/21m1450677

Cited by 6 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sun, Lu, and Hong (2020) provided the first decentralized stochastic algorithm, D-GET, combining variance reduction and gradient tracking. Li, Li, and Chi (2022); Xin, Khan, and Kar (2022) further proposed algorithms with improved complexity bound. Recently, DEAREST (Luo and Ye 2022) is the first decentralized stochastic algorithm that achieves both optimal computation and communication complexity.…”

Section: Decentralized Stochastic First-order Methodsmentioning

confidence: 99%

“…Convergence analysis of these works does not apply to Problem (1) due to the mismatch of problem assumptions. The other class of methods (Li, Li, and Chi 2022;Luo and Ye 2022;Xin, Khan, and Kar 2022) assumes that component functions f i,j (•) are nonconvex and the global objective function f (•) is also possibly nonconvex. Consequently, the rate achieved by these methods is not optimal for Problem (1).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Decentralized Sum-of-Nonconvex Optimization

Liu,

Low

2024

AAAI

View full text Add to dashboard Cite

We consider the optimization problem of minimizing the sum-of-nonconvex function, i.e., a convex function that is the average of nonconvex components. The existing stochastic algorithms for such a problem only focus on a single machine and the centralized scenario. In this paper, we study the sum-of-nonconvex optimization in the decentralized setting. We present a new theoretical analysis of the PMGT-SVRG algorithm for this problem and prove the linear convergence of their approach. However, the convergence rate of the PMGT-SVRG algorithm has a linear dependency on the condition number, which is undesirable for the ill-conditioned problem. To remedy this issue, we propose an accelerated stochastic decentralized first-order algorithm by incorporating the techniques of acceleration, gradient tracking, and multi-consensus mixing into the SVRG algorithm. The convergence rate of the proposed method has a square-root dependency on the condition number. The numerical experiments validate the theoretical guarantee of our proposed algorithms on both synthetic and real-world datasets.

show abstract

Section: Decentralized Stochastic First-order Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Decentralized Sum-of-Nonconvex Optimization

Liu,

Low

2024

AAAI

View full text Add to dashboard Cite

show abstract

“…More recently, gradient tracking has been utilized to further enhance the convergence rate of new methods; see (Lu et al 2019;Zhang and You 2020;Koloskova, Lin, and Stich 2021;Xin, Khan, and Kar 2021b) for further discussions. Variance reduction methods that mimic updates from the SARAH (Nguyen et al 2017b) and SPIDER (Wang et al 2019) methods provide optimal gradient complexity results at the expense of large batch computations; examples include D-SPIDER-SFO (Pan, Liu, and Wang 2020), D-GET (Sun, Lu, and Hong 2020), GT-SARAH (Xin, Khan, and Kar 2022), DE-STRESS (Li, Li, and Chi 2022). To avoid the large batch requirement of these methods, the STORM (Cutkosky and Orabona 2019; Xu and Xu 2023) and Hybrid-SGD (Tran-Dinh et al 2022a) methods have also been adapted to the decentralized setting; see GT-STORM (Zhang et al 2021b) and GT-HSGD (Xin, Khan, and Kar 2021a).…”

Section: Related Workmentioning

confidence: 99%

Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization

Zhang,

Mancino-Ball,

Aybat

et al. 2024

AAAI

View full text Add to dashboard Cite

We propose a novel single-loop decentralized algorithm, DGDA-VR, for solving the stochastic nonconvex strongly-concave minimax problems over a connected network of agents, which are equipped with stochastic first-order oracles to estimate their local gradients. DGDA-VR, incorporating variance reduction, achieves O(ε^−3) oracle complexity and O(ε^−2) communication complexity without resorting to multi-communication rounds – both are optimal, i.e., matching the lower bounds for this class of problems. Since DGDA-VR does not require multiple communication rounds, it is applicable to a broader range of decentralized computational environments. To the best of our knowledge, this is the first distributed method using a single communication round in each iteration to jointly optimize the oracle and communication complexities for the problem considered here.

show abstract

“…Sun et al [32] first applied variance reduction and gradient tracking to decentralized nonconvex finite-sum optimization and proposed the algorithm called Decentralized Gradient Estimation and Tracking (D-GET). Later, Xin et al [38] proposed GT-SARAH, which improved the complexity of D-GET in terms of the dependency on m and n. Li et al [18] further improved the result of GT-SARAH by proposing DEcentralized STochastic REcurSive gradient methodS (DESTRESS), which requires O n + n/mLε −2 per-agent IFO calls and O √ mn…”

Section: Algorithmsmentioning

confidence: 99%

“…, x d ] ⊤ ∈ R d is the vector of the classifier. We compare the proposed DEAREST with baseline algorithms GT-SARAH [37] and DESTRESS [18] on three real-world datasets "a9a" (mn = 32, 560, d = 123), "w8a" (mn = 49, 740, d = 300) and "rcv1" (mn = 20, 420, d = 47, 236). All of the datasets can be downloaded from LIBSVM repository [6].…”

Section: Numerical Experimentsmentioning

confidence: 99%

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Luo¹,

Ye²

2022

Preprint

View full text Add to dashboard Cite

This paper studies the decentralized nonconvex optimization problem min x∈R d f (x)where fi(x)1 n n j=1 fi,j (x) is the local function on the i-th agent of the network. We propose a novel stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST), which integrates the techniques of variance reduction, gradient tracking and multi-consensus. We construct a Lyapunov function that simultaneously characterizes the function value, the gradient estimation error and the consensus error for the convergence analysis. Based on this measure, we provide a concise proof to show DEAREST requires at most O(mn + √ mnLε −2 ) incremental first-order oracle (IFO) calls and O(Lε −2 / 1 − λ2(W ) ) communication rounds to find an ε-stationary point in expectation, where L is the smoothness parameter and λ2(W ) is the second-largest eigenvalue of the gossip matrix W . We can verify both of the IFO complexity and communication complexity match the lower bounds. To the best of our knowledge, DEAREST is the first optimal algorithm for decentralized nonconvex finite-sum optimization.

show abstract

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Cited by 6 publications

References 20 publications

Decentralized Sum-of-Nonconvex Optimization

Decentralized Sum-of-Nonconvex Optimization

Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

Contact Info

Product

Resources

About