2019
DOI: 10.1007/978-3-030-10925-7_24
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Decentralized Deep Learning by Dynamic Model Averaging

Abstract: We propose an efficient protocol for decentralized training of deep neural networks from distributed data sources. The proposed protocol allows to handle different phases of model training equally well and to quickly adapt to concept drifts. This leads to a reduction of communication by an order of magnitude compared to periodically communicating state-of-the-art approaches. Moreover, we derive a communication bound that scales well with the hardness of the serialized learning problem. The reduction in communi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
65
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 83 publications
(68 citation statements)
references
References 19 publications
0
65
0
Order By: Relevance
“…The superior training speed-up performance of model averaging has been empirically observed in various deep learning scenarios, e.g., CNN for MNIST in (Zhang et al 2016) (Kamp et al 2018)(McMahan et al 2017; VGG for CIFAR10 in (Zhou and Cong 2017); DNN-GMM for speech recognition in (Chen and Huo 2016) (Su, Chen, and Xu 2018); and LSTM for language modeling in (McMahan et al 2017). A thorough empirical study of ResNet over CI-FAR and ImageNet is also available in the recent work (Lin, Stich, and Jaggi 2018).…”
Section: Methodsmentioning
confidence: 97%
“…The superior training speed-up performance of model averaging has been empirically observed in various deep learning scenarios, e.g., CNN for MNIST in (Zhang et al 2016) (Kamp et al 2018)(McMahan et al 2017; VGG for CIFAR10 in (Zhou and Cong 2017); DNN-GMM for speech recognition in (Chen and Huo 2016) (Su, Chen, and Xu 2018); and LSTM for language modeling in (McMahan et al 2017). A thorough empirical study of ResNet over CI-FAR and ImageNet is also available in the recent work (Lin, Stich, and Jaggi 2018).…”
Section: Methodsmentioning
confidence: 97%
“…Later, the design was enhanced by coding the updates such that the full update-averaging can be still realized using only a portion of coded updates [9]. The second approach also aims at reducing the number of transmitting devices, but the scheduling criterion is update significance instead of computation speed [10], [11]. If FEEL is implemented based on model averaging, the update significance is measured by the model variance which indicates the divergence of a particular local model from the average across all local models [10].…”
Section: A Federated Edge Learning and Multi-accessmentioning
confidence: 99%
“…Kamp et al [52] proposed to average models dynamically depending on the utility of the communication, which leads to a reduction of communication by an order of magnitude compared to periodically communicating state-of-the-art approaches. This facet is well suited for massively distributed systems with limited communication infrastructure.…”
Section: Updates Reductionmentioning
confidence: 99%