Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Lian, Xiangru; Huang, Yijun; Li, Yuncheng; Liu, Ji

doi:10.48550/arxiv.1506.08272

Cited by 20 publications

(6 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These diagonal matrices account for any possible pattern of (potentially) partial updates that can occur while hyperedge s j is being processed. We would like to note that the above notation bears resemblance to the coordinate-update mismatch formulation of asynchronous coordinatebased algorithms, as in [21,27,28].…”

Section: The Perturbed Iterates View Of Asynchronymentioning

confidence: 99%

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Mania¹,

Pan²,

et al. 2017

View full text Add to dashboard Cite

We introduce and analyze stochastic optimization methods where the input to each update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of stochastic optimization algorithms, by viewing them as serial methods operating on noisy inputs. Using our perturbed iterate framework, we provide new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates. We proceed to apply our framework to develop and analyze KroMagnon: a novel, parallel, sparse stochastic variance-reduced gradient (SVRG) algorithm. We demonstrate experimentally on a 16-core machine that the sparse and parallel version of SVRG is in some cases more than four orders of magnitude faster than the standard SVRG algorithm.

show abstract

Section: The Perturbed Iterates View Of Asynchronymentioning

confidence: 99%

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Mania¹,

Pan²,

et al. 2017

View full text Add to dashboard Cite

show abstract

“…In Chaturapruek, Duchi, and Ré (2015), the authors show that because of the noise inherent to the sampling process within SGD, the errors introduced by asynchrony in the sharedmemory implementation are asymptotically negligible. A detailed comparison of both computer network and shared memory implementation is given in Lian et al (2015). Again, the aforementioned asynchronous algorithms are not distributed since they rely on a shared-memory or central coordinator.…”

Section: Parallel Sgdmentioning

confidence: 99%

Distributed Deep Learning with Event-Triggered Communication

George¹,

Gurram²

2019

Preprint

View full text Add to dashboard Cite

We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural networks to recognize handwritten digits in images, while aperiodically sharing the model parameters with their onehop neighbors. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural network, while the event-triggered communication provides significant reduction in inter-agent communication. Results also show that the proposed algorithm allows the individual agents to recognize the digits even though the training data corresponding to all the digits are not locally available to each agent.

show abstract

“…Note that in most early literature's, AP is used to parallelize optimization based iterative algorithms. For example, , Niu et al (2011), andLian et al (2015) applied the asynchronous parallelism to accelerate the stochastic gradient descent algorithm for solving deep learning, primal SVM, matrix completion, etc. Liu and Wright (2014) and Hsieh et al (2015) proposed the asynchronous parallel stochastic coordinate descent for solving dual SVM, LASSO, etc.…”

Section: Related Workmentioning

confidence: 99%

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

Zhong,

Huang,

Liu

2017

Preprint

Self Cite

View full text Add to dashboard Cite

This paper considers the multi-armed thresholding bandit problem -identifying all arms whose expected rewards are above a predefined threshold via as few pulls (or rounds) as possibleproposed by Locatelli et al. (2016) recently. Although the proposed algorithm in Locatelli et al. (2016) achieves the optimal round complexity 1 in a certain sense, there still remain unsolved issues. This paper proposes an asynchronous parallel thresholding algorithm and its parameterfree version to improve the efficiency and the applicability. On one hand, the proposed two algorithms use the empirical variance to guide the pull decision at each round, and significantly improve the round complexity of the "optimal" algorithm when all arms have bounded high order moments. The proposed algorithms can be proven to be optimal. On the other hand, most bandit algorithms assume that the reward can be observed immediately after the pull or the next decision would not be made before all rewards are observed. Our proposed asynchronous parallel algorithms allow making the choice of the next pull with unobserved rewards from earlier pulls, which avoids such an unrealistic assumption and significantly improves the identification process. Our theoretical analysis justifies the effectiveness and the efficiency of proposed asynchronous parallel algorithms. The empirical study is also provided to validate the proposed algorithms.

show abstract

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Cited by 20 publications

References 31 publications

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Distributed Deep Learning with Event-Triggered Communication

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

Contact Info

Product

Resources

About