2021
DOI: 10.48550/arxiv.2106.04759
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Communication-efficient SGD: From Local SGD to One-Shot Averaging

Abstract: We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among N workers, who can take SGD steps and coordinate with a central server. While it is possible to obtain a linear reduction in the variance by averaging all the stochastic gradient at every step, this requires a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…However, their analysis and algorithm are not applicable in the composite setting, and they do not consider statistical recovery at all. A more recent work Spiridonoff et al (2021) showed that the number of rounds can be independent of T under homogeneous setting. Recently, there is a line of work focusing on analyzing nonconvex problems in FL (Yu et al, 2019b;a;Basu et al, 2019;Haddadpour et al, 2019).…”
Section: A Related Workmentioning
confidence: 99%
“…However, their analysis and algorithm are not applicable in the composite setting, and they do not consider statistical recovery at all. A more recent work Spiridonoff et al (2021) showed that the number of rounds can be independent of T under homogeneous setting. Recently, there is a line of work focusing on analyzing nonconvex problems in FL (Yu et al, 2019b;a;Basu et al, 2019;Haddadpour et al, 2019).…”
Section: A Related Workmentioning
confidence: 99%