2015
DOI: 10.48550/arxiv.1506.08272
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Abstract: Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is over a computer ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(6 citation statements)
references
References 31 publications
0
6
0
Order By: Relevance
“…These diagonal matrices account for any possible pattern of (potentially) partial updates that can occur while hyperedge s j is being processed. We would like to note that the above notation bears resemblance to the coordinate-update mismatch formulation of asynchronous coordinatebased algorithms, as in [21,27,28].…”
Section: The Perturbed Iterates View Of Asynchronymentioning
confidence: 99%
“…These diagonal matrices account for any possible pattern of (potentially) partial updates that can occur while hyperedge s j is being processed. We would like to note that the above notation bears resemblance to the coordinate-update mismatch formulation of asynchronous coordinatebased algorithms, as in [21,27,28].…”
Section: The Perturbed Iterates View Of Asynchronymentioning
confidence: 99%
“…In Chaturapruek, Duchi, and Ré (2015), the authors show that because of the noise inherent to the sampling process within SGD, the errors introduced by asynchrony in the sharedmemory implementation are asymptotically negligible. A detailed comparison of both computer network and shared memory implementation is given in Lian et al (2015). Again, the aforementioned asynchronous algorithms are not distributed since they rely on a shared-memory or central coordinator.…”
Section: Parallel Sgdmentioning
confidence: 99%
“…Note that in most early literature's, AP is used to parallelize optimization based iterative algorithms. For example, , Niu et al (2011), andLian et al (2015) applied the asynchronous parallelism to accelerate the stochastic gradient descent algorithm for solving deep learning, primal SVM, matrix completion, etc. Liu and Wright (2014) and Hsieh et al (2015) proposed the asynchronous parallel stochastic coordinate descent for solving dual SVM, LASSO, etc.…”
Section: Related Workmentioning
confidence: 99%