Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

Liu, Ji; Wright, Stephen J.

doi:10.1137/140961134

Cited by 195 publications

(246 citation statements)

References 44 publications

Supporting

Mentioning

237

Contrasting

Order By: Relevance

“…Eff Blck Prx Par Acc Notable feature Leventhal and Lewis '08 [7] ✓ × × × × quadratic f S-Shwartz and Tewari '09 [22] ✓ × 1 × × 1st 1 -regularized Nesterov '10 [15] × ✓ × × ✓ 1st blck & 1st acc Richtárik and Takáč '11 [20] ✓ ✓ ✓ × × 1st proximal Bradley et al '12 [2] ✓ × 1 ✓ × 1 -regularized parallel Richtárik and Takáč '12 [21] ✓ ✓ ✓ ✓ × 1st general parallel S.-Shwartz and Zhang '12 [23] ✓ ✓ ✓ × × 1st primal-dual Necoara et al '12 [12] ✓ ✓ × × × 2-coordinate descent Takáč et al '13 [26] ✓ × × ✓ × 1st primal-d. & parallel Tappenden et al '13 [27] ✓ ✓ ✓ × × 1st inexact Necoara and Clipici '13 [11] ✓ ✓ ✓ × × coupled constraints Lin and Xiao '13 [30] × ✓ × × ✓ improvements on [15,20] Fercoq and Richtárik '13 [5] ✓ ✓ ✓ ✓ × 1st nonsmooth f Lee and Sidford '13 [6] ✓ × × × ✓ 1st efficient accelerated Richtárik and Takáč '13 [18] ✓ × ✓ ✓ × 1st distributed Liu et al '13 [9] ✓ × × ✓ × 1st asynchronous S.-Shwartz and Zhang '13 [24] ✓ × ✓ × ✓ acceleration in the primal Richtárik and Takáč '13 [19] ✓ × × ✓ × 1st arbitrary sampling This paper '13 ✓ ✓ ✓ ✓ ✓ 5 times ✓ Several variants of proximal and parallel (but nonaccelerated) randomized coordinate descent methods were proposed [2,21,5,18]. In Table 1 we provide a list Table 2 The methods in this table all arise as special cases of APPROX by varying four elements: the presence and form of the proximal term ψ in the problem formulation ("Prx"), the number of blocks n we decide to split the variable x ∈ R N into ("Blck"), the choice of the block samplingsŜ, and the choice of the stepsize parameter θ k [GD = gradient descent; BCD = block coordinate descent].…”

Section: Papermentioning

confidence: 99%

Accelerated, Parallel, and Proximal Coordinate Descent

Fercoq¹,

Richtárik²

2015

SIAM J. Optim.

239

323

View full text Add to dashboard Cite

Abstract. We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ωLR 2 /(k + 1) 2 , where k is the iteration counter,ω is a data-weighted average degree of separability of the loss function,L is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well serial, parallel, and distributed versions of randomized block coordinate descent. Our bounds match or improve on the best known bounds for these methods.

show abstract

Section: Papermentioning

confidence: 99%

Accelerated, Parallel, and Proximal Coordinate Descent

Fercoq¹,

Richtárik²

2015

SIAM J. Optim.

239

323

View full text Add to dashboard Cite

show abstract

“…Currently, we have not found a counterexample where N σ L ≤ M for a positive constant σ. Additionally, Assumption 2.4 in this paper seems a little strict. Recently, for the BCGD method with the random rule, some iteration complexity results have been established without Assumption 2.4 [4,5]. In the future, it would be challenging to study the iteration complexity of the BCGD method with the cyclic rule without Assumption 2.4.…”

Section: Discussionmentioning

confidence: 99%

Iteration Complexity of a Block Coordinate Gradient Descent Method for Convex Optimization

Hua¹,

Yamashita²

2015

SIAM J. Optim.

View full text Add to dashboard Cite

In this paper, we study the iteration complexity of a block coordinate gradient descent (BCGD) method with a cyclic rule for solving convex optimization problems. We propose a new Lipschitz continuity-like assumption and show that the iteration complexity for the proposed BCGD method can be improved to O( max{M, L} ε ), where M is the constant in the proposed assumption, L is the usual Lipschitz constant for the gradient of the objective function, and ε > 0 is the required precision. In addition, we analyze the relation between M and L, and prove that, in the worst case, M ≤ √ NL, where N is the number of blocks.

show abstract

“…Corollary III.1. Let Assumptions II.1 III.1, III.3(II), and III.4 hold, and suppose ρ satisfies (25). Then all conclusions in Theorem III.1 hold true for the sequence generated by (EDANNI) with subproblems being solved inexactly (as quantified above).…”

Section: A Inexactly Solving the Subproblemsmentioning

confidence: 95%

Provably Communication-Efficient Asynchronous Distributed Inference for Convex and Nonconvex Problems

Ren

Haupt

2018

2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

View full text Add to dashboard Cite

This paper proposes and analyzes a communicationefficient distributed optimization framework for general nonconvex nonsmooth signal processing and machine learning problems under an asynchronous protocol. At each iteration, worker machines compute gradients of a known empirical loss function using their own local data, and a master machine solves a related minimization problem to update the current estimate. We prove that for nonconvex nonsmooth problems, the proposed algorithm converges with a sublinear rate over the number of communication rounds, coinciding with the best theoretical rate that can be achieved for this class of problems. Linear convergence is established without any statistical assumptions of the local data for problems characterized by composite loss functions whose smooth parts are strongly convex. Extensive numerical experiments verify that the performance of the proposed approach indeed improves -sometimes significantly -over other state-of-the-art algorithms in terms of total communication efficiency.

show abstract

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

Cited by 195 publications

References 44 publications

Accelerated, Parallel, and Proximal Coordinate Descent

Accelerated, Parallel, and Proximal Coordinate Descent

Iteration Complexity of a Block Coordinate Gradient Descent Method for Convex Optimization

Provably Communication-Efficient Asynchronous Distributed Inference for Convex and Nonconvex Problems

Contact Info

Product

Resources

About