2013
DOI: 10.1007/978-3-642-40994-3_2
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Boosting with Momentum

Abstract: Abstract. We describe a new, simplified, and general analysis of a fusion of Nesterov's accelerated gradient with parallel coordinate descent. The resulting algorithm, which we call BOOM, for boosting with momentum, enjoys the merits of both techniques. Namely, BOOM retains the momentum and convergence properties of the accelerated gradient method while taking into account the curvature of the objective function. We describe an distributed implementation of BOOM which is suitable for massive high dimensional d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(19 citation statements)
references
References 11 publications
(16 reference statements)
0
19
0
Order By: Relevance
“…The training points were generated randomly as described in [13], with N = 7000 and n = 50. To establish a reference benchmark with a well known algorithm, we used the particular implementation [13] of one of the coordinate descent (CD) methods of Tseng and Yun [26]. Figure 1 reports the performance of SGD (with β = 7) and SQN (with β = 2), as measured by accessed data points.…”
Section: Experiments With Synthetic Datasetsmentioning
confidence: 99%
“…The training points were generated randomly as described in [13], with N = 7000 and n = 50. To establish a reference benchmark with a well known algorithm, we used the particular implementation [13] of one of the coordinate descent (CD) methods of Tseng and Yun [26]. Figure 1 reports the performance of SGD (with β = 7) and SQN (with β = 2), as measured by accessed data points.…”
Section: Experiments With Synthetic Datasetsmentioning
confidence: 99%
“…Parallel methods were considered in [2,19,21], and more recently in [1,5,6,12,13,25,27,28]. A memory distributed method scaling to big data problems was recently developed in [22].…”
Section: Literaturementioning
confidence: 99%
“…-an asynchronous version of Parallel Coordinate Descent with τ -independent sampling (τ = 16) (Algorithm 2) Comparison of algorithms for the resolution of the Adaboost problem on the URL reputation dataset with 16 processors (same colours as in Figure 1). based on the code of [11] which is freely available; the τ -independent sampling is a good approximation of the τ -nice sampling for τ ≪ n, -the fully parallel coordinate descent method [6], -the accelerated version of the fully parallel coordinate descent method [7], -the classical Adaboost algorithm (greedy coordinate descent); we performed the search for the largest absolute value of the gradient in parallel.…”
Section: Numerical Experimentsmentioning
confidence: 99%