2014
DOI: 10.48550/arxiv.1405.4544
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A distributed block coordinate descent method for training $l_1$ regularized linear classifiers

Dhruv Mahajan,
S. Sathiya Keerthi,
S. Sundararajan

Abstract: Distributed training of l 1 regularized classifiers has received great attention recently. Most existing methods approach this problem by taking steps obtained from approximating the objective by a quadratic approximation that is decoupled at the individual variable level. These methods are designed for multicore and MPI platforms where communication costs are low. They are inefficient on systems such as Hadoop running on a cluster of commodity machines where communication costs are substantial. In this paper … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 21 publications
(69 reference statements)
0
5
0
Order By: Relevance
“…Training algorithms that can be distributed across multiple machines have been the subject of a significant amount of research. Distributed techniques based on stochastic gradient descent have been proposed (see [17] and [18]) as well as methods based on coordinate descent/ascent (see [7], [19], [20], [21] and [22]). These distributed learning algorithms typically involve each machine (or worker) performing a number of optimization steps to approximately minimize the global objective function using the local data that it has available.…”
Section: Distributed Stochastic Learningmentioning
confidence: 99%
“…Training algorithms that can be distributed across multiple machines have been the subject of a significant amount of research. Distributed techniques based on stochastic gradient descent have been proposed (see [17] and [18]) as well as methods based on coordinate descent/ascent (see [7], [19], [20], [21] and [22]). These distributed learning algorithms typically involve each machine (or worker) performing a number of optimization steps to approximately minimize the global objective function using the local data that it has available.…”
Section: Distributed Stochastic Learningmentioning
confidence: 99%
“…(Lee and Roth, 2015) derived an analytical solution of the optimal step size for dual linear support vector machine problems. Besides (Mahajan et al, 2013) presented a general framework for distributed optimization based on local functional approximation, which include several first-order and second-order methods as special cases, and (Mahajan et al, 2014) considered each machine to handle a block of coordinates, and proposed distributed block coordinate descent methods for solving ℓ 1 regularized loss minimization problems.…”
Section: Related Workmentioning
confidence: 99%
“…A major challenge is to reduce the training time as much as possible when we increase the number of machines. A practical solution requires two research directions: one is to improve the underlying system design making it suitable for machine learning algorithms (Dean and Ghemawat, 2008;Zaharia et al, 2012;Dean et al, 2012;Li et al, 2014); the other is to adapt traditional single-machine optimization methods to handle data parallelism (Boyd et al, 2011;Yang, 2013;Mahajan et al, 2013;Shamir et al, 2014;Jaggi et al, 2014;Mahajan et al, 2014;Ma et al, 2017;Takáč et al, 2015;Zhang and Lin, 2015). This paper focuses on the latter.…”
Section: Introductionmentioning
confidence: 99%
“…Allen-Zhu and Yuan [2015a] further improve the convergece speed using a novel nonuniform sampling that selects each coordinate with a probability proportional to the square root of the smoothness parameter. Other acceleration techniques Qu and Richtárik [2014], , Nesterov [2012], as well as mini-batch and distributed variants on coordinate method Liu and Wright [2015], Zhao et al [2014], Jaggi et al [2014], Mahajan et al [2014] have been studied in literature. See for a review on the coordinate method.…”
Section: Introductionmentioning
confidence: 99%