2015
DOI: 10.1137/140961134
|View full text |Cite
|
Sign up to set email alerts
|

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

Abstract: We describe an asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function added to a separable convex function. In contrast to previous analyses, our model of asynchronous computation accounts for the fact that components of the unknown vector may be written by some cores simultaneously with being read by others. Despite the complications arising from this possibility, the method achieves a linear convergence … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
237
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 195 publications
(246 citation statements)
references
References 44 publications
6
237
0
Order By: Relevance
“…Eff Blck Prx Par Acc Notable feature Leventhal and Lewis '08 [7] ✓ × × × × quadratic f S-Shwartz and Tewari '09 [22] ✓ × 1 × × 1st 1 -regularized Nesterov '10 [15] × ✓ × × ✓ 1st blck & 1st acc Richtárik and Takáč '11 [20] ✓ ✓ ✓ × × 1st proximal Bradley et al '12 [2] ✓ × 1 ✓ × 1 -regularized parallel Richtárik and Takáč '12 [21] ✓ ✓ ✓ ✓ × 1st general parallel S.-Shwartz and Zhang '12 [23] ✓ ✓ ✓ × × 1st primal-dual Necoara et al '12 [12] ✓ ✓ × × × 2-coordinate descent Takáč et al '13 [26] ✓ × × ✓ × 1st primal-d. & parallel Tappenden et al '13 [27] ✓ ✓ ✓ × × 1st inexact Necoara and Clipici '13 [11] ✓ ✓ ✓ × × coupled constraints Lin and Xiao '13 [30] × ✓ × × ✓ improvements on [15,20] Fercoq and Richtárik '13 [5] ✓ ✓ ✓ ✓ × 1st nonsmooth f Lee and Sidford '13 [6] ✓ × × × ✓ 1st efficient accelerated Richtárik and Takáč '13 [18] ✓ × ✓ ✓ × 1st distributed Liu et al '13 [9] ✓ × × ✓ × 1st asynchronous S.-Shwartz and Zhang '13 [24] ✓ × ✓ × ✓ acceleration in the primal Richtárik and Takáč '13 [19] ✓ × × ✓ × 1st arbitrary sampling This paper '13 ✓ ✓ ✓ ✓ ✓ 5 times ✓ Several variants of proximal and parallel (but nonaccelerated) randomized coordinate descent methods were proposed [2,21,5,18]. In Table 1 we provide a list Table 2 The methods in this table all arise as special cases of APPROX by varying four elements: the presence and form of the proximal term ψ in the problem formulation ("Prx"), the number of blocks n we decide to split the variable x ∈ R N into ("Blck"), the choice of the block samplingsŜ, and the choice of the stepsize parameter θ k [GD = gradient descent; BCD = block coordinate descent].…”
Section: Papermentioning
confidence: 99%
“…Eff Blck Prx Par Acc Notable feature Leventhal and Lewis '08 [7] ✓ × × × × quadratic f S-Shwartz and Tewari '09 [22] ✓ × 1 × × 1st 1 -regularized Nesterov '10 [15] × ✓ × × ✓ 1st blck & 1st acc Richtárik and Takáč '11 [20] ✓ ✓ ✓ × × 1st proximal Bradley et al '12 [2] ✓ × 1 ✓ × 1 -regularized parallel Richtárik and Takáč '12 [21] ✓ ✓ ✓ ✓ × 1st general parallel S.-Shwartz and Zhang '12 [23] ✓ ✓ ✓ × × 1st primal-dual Necoara et al '12 [12] ✓ ✓ × × × 2-coordinate descent Takáč et al '13 [26] ✓ × × ✓ × 1st primal-d. & parallel Tappenden et al '13 [27] ✓ ✓ ✓ × × 1st inexact Necoara and Clipici '13 [11] ✓ ✓ ✓ × × coupled constraints Lin and Xiao '13 [30] × ✓ × × ✓ improvements on [15,20] Fercoq and Richtárik '13 [5] ✓ ✓ ✓ ✓ × 1st nonsmooth f Lee and Sidford '13 [6] ✓ × × × ✓ 1st efficient accelerated Richtárik and Takáč '13 [18] ✓ × ✓ ✓ × 1st distributed Liu et al '13 [9] ✓ × × ✓ × 1st asynchronous S.-Shwartz and Zhang '13 [24] ✓ × ✓ × ✓ acceleration in the primal Richtárik and Takáč '13 [19] ✓ × × ✓ × 1st arbitrary sampling This paper '13 ✓ ✓ ✓ ✓ ✓ 5 times ✓ Several variants of proximal and parallel (but nonaccelerated) randomized coordinate descent methods were proposed [2,21,5,18]. In Table 1 we provide a list Table 2 The methods in this table all arise as special cases of APPROX by varying four elements: the presence and form of the proximal term ψ in the problem formulation ("Prx"), the number of blocks n we decide to split the variable x ∈ R N into ("Blck"), the choice of the block samplingsŜ, and the choice of the stepsize parameter θ k [GD = gradient descent; BCD = block coordinate descent].…”
Section: Papermentioning
confidence: 99%
“…Currently, we have not found a counterexample where N σ L ≤ M for a positive constant σ. Additionally, Assumption 2.4 in this paper seems a little strict. Recently, for the BCGD method with the random rule, some iteration complexity results have been established without Assumption 2.4 [4,5]. In the future, it would be challenging to study the iteration complexity of the BCGD method with the cyclic rule without Assumption 2.4.…”
Section: Discussionmentioning
confidence: 99%
“…Corollary III.1. Let Assumptions II.1 III.1, III.3(II), and III.4 hold, and suppose ρ satisfies (25). Then all conclusions in Theorem III.1 hold true for the sequence generated by (EDANNI) with subproblems being solved inexactly (as quantified above).…”
Section: A Inexactly Solving the Subproblemsmentioning
confidence: 95%