Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses

Qing, Tao; Kong, Kang; Chu, Dejun; Wu, Guangjian

doi:10.1007/978-3-642-33460-3_40

Cited by 13 publications

(10 citation statements)

References 21 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work belongs to a growing literature on randomized methods for various problems appearing in linear algebra, optimization and computer science. In particular, relevant methods include sketching algorithms, randomized Kaczmarz, stochastic gradient descent and their variants [55,31,9,38,66,33,34,45,50,56,17,47,19,62,8,18,65,35,7,32,26,14,40,23] and randomized coordinate and subspace type methods and their variants [21,16,51,36,60,3,48,37,49,57,28,58,30,46,11,53,10,12,20,41,13,42,43,64,44,…”

Section: A New Family Of Stochastic Optimization Algorithmsmentioning

confidence: 99%

Stochastic Dual Ascent for Solving Linear Systems

Gower,

Richtarik

2015

Preprint

View full text Add to dashboard Cite

We develop a new randomized iterative algorithm-stochastic dual ascent (SDA)-for finding the projection of a given vector onto the solution space of a linear system. The method is dual in nature: with the dual being a non-strongly concave quadratic maximization problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully chosen point in a subspace spanned by the columns of a random matrix drawn independently from a fixed distribution. The distribution plays the role of a parameter of the method. Our complexity results hold for a wide family of distributions of random matrices, which opens the possibility to fine-tune the stochasticity of the method to particular applications. We prove that primal iterates associated with the dual process converge to the projection exponentially fast in expectation, and give a formula and an insightful lower bound for the convergence rate. We also prove that the same rate applies to dual function values, primal function values and the duality gap. Unlike traditional iterative methods, SDA converges under no additional assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our lower bound improves as the rank of the system matrix drops. Many existing randomized methods for linear systems arise as special cases of SDA, including randomized Kaczmarz, randomized Newton, randomized coordinate descent, Gaussian descent, and their variants. In special cases where our method specializes to a known algorithm, we either recover the best known rates, or improve upon them. Finally, we show that the framework can be applied to the distributed average consensus problem to obtain an array of new algorithms. The randomized gossip algorithm arises as a special case.

show abstract

Section: A New Family Of Stochastic Optimization Algorithmsmentioning

confidence: 99%

Stochastic Dual Ascent for Solving Linear Systems

Gower,

Richtarik

2015

Preprint

View full text Add to dashboard Cite

show abstract

“…When the function is not smooth neither composite, it is still possible to define coordinate descent methods with subgradients. An algorithm based on the averaging of past subgradient coordinates is presented in [34] and a successful subgradient-based coordinate descent method for problems with sparse subgradients is proposed by Nesterov [20]. Tappenden et al [36] analyzed an inexact randomized coordinate descent method in which proximal subproblems at each iteration are solved only approximately.…”

Section: Brief Literature Reviewmentioning

confidence: 99%

Smooth Minimization of Nonsmooth Functions with Parallel Coordinate Descent Methods

Fercoq

Richtárik

2019

Springer Proceedings in Mathematics &Amp; Statistics

View full text Add to dashboard Cite

We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions. The problem class includes as a special case L1-regularized L1 regression and the minimization of the exponential loss ("AdaBoost problem"). We assume the input data defining the loss function is contained in a sparse m × n matrix A with at most ω nonzeros in each row. Our methods need O(nβ/τ ) iterations to find an approximate solution with high probability, where τ is the number of processors and β = 1 + (ω − 1)(τ − 1)/(n − 1) for the fastest variant. The notation hides dependence on quantities such as the required accuracy and confidence levels and the distance of the starting iterate from an optimal point. Since β/τ is a decreasing function of τ , the method needs fewer iterations when more processors are used. Certain variants of our algorithms perform on average only O(nnz(A)/n) arithmetic operations during a single iteration per processor and, because β decreases when ω does, fewer iterations are needed for sparser problems.

show abstract

“…Paper Rate Const PD-CD Notable feature Platt '99 [43] × ✓ P for SVM Tseng & Yun '09 [61] ✓ ✓ P adapts Gauss-Southwell rule Tao et al '12 [57] ✓ × P uses averages of subgradients Necoara et al '12 [38] ✓ ✓ P 2-coordinate descent Nesterov '12 [40] ✓ × P uses subgradients Necoara & Clipici '13 [37] ✓ ✓ P coupled constraints Combettes & Pesquet '14 [11] × ✓ ✓ 1st PD-CD, short step sizes Bianchi et al '14 [5] × ✓ ✓ distributed optimization Hong et al '14 [24] × ✓ × updates all dual variables Fercoq & Richtárik '17 [19] ✓ × P uses smoothing Alacaoglu et al '17 [1] ✓ ✓ ✓ 1st PD-CD w rate for constraints Xu & Zhang '18 [66] ✓ ✓ × better rate than [21] Chambolle et al '18 [9] ✓ ✓ × updates all primal variables Fercoq & Bianchi '19 [15] ✓ ✓ ✓ 1st PD-CD w long step sizes Gao et al '19 [21] ✓ ✓ × 1st primal-dual w rate for constraints Latafat et al '19 [26] ✓ ✓ ✓ linear conv w growth condition Table 3. Selected papers for the minimization of non-smooth non-separable functions.…”

Section: Non-convex Functionsmentioning

confidence: 99%

A generic coordinate descent solver for non-smooth convex optimisation

Fercoq

2019

Optimization Methods and Software

View full text Add to dashboard Cite

We present a generic coordinate descent solver for the minimization of a nonsmooth convex objective with structure. The method can deal in particular with problems with linear constraints. The implementation makes use of efficient residual updates and automatically determines which dual variables should be duplicated. A list of basic functional atoms is pre-compiled for efficiency and a modelling language in Python allows the user to combine them at run time. So, the algorithm can be used to solve a large variety of problems including Lasso, sparse multinomial logistic regression, linear and quadratic programs.

show abstract

Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses

Cited by 13 publications

References 21 publications

Stochastic Dual Ascent for Solving Linear Systems

Stochastic Dual Ascent for Solving Linear Systems

Smooth Minimization of Nonsmooth Functions with Parallel Coordinate Descent Methods

A generic coordinate descent solver for non-smooth convex optimisation

Contact Info

Product

Resources

About