2019
DOI: 10.48550/arxiv.1909.05020
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributed Deep Learning with Event-Triggered Communication

Abstract: We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…The event-triggered threshold plays an important role in this algorithm since it decides the number of communication cost among clients and the convergence performance. While the work from [28] measures the difference between the current local model and the new one as a threshold to broadcast their new models, other works [29,30] use the relevant gradient metric from the clients to decide the period when this SGD computation can be sent to other neighbors.…”
Section: Distributed Deep Learningmentioning
confidence: 99%
“…The event-triggered threshold plays an important role in this algorithm since it decides the number of communication cost among clients and the convergence performance. While the work from [28] measures the difference between the current local model and the new one as a threshold to broadcast their new models, other works [29,30] use the relevant gradient metric from the clients to decide the period when this SGD computation can be sent to other neighbors.…”
Section: Distributed Deep Learningmentioning
confidence: 99%
“…Notable works include using average consensus [7] and Bayesian methods [8], [9], which trade convergence time with resilience to individual failures. The approach of George et al [10] offers convergence speed comparable to a server-based approach at the cost of assuming the communication topology of the clients to be fixed and predetermined.…”
Section: Related Workmentioning
confidence: 99%
“…The paper closest to ours seems to be [53] where the authors considered a federated learning scenario and proposed an event-triggered communication scheme for the model parameters based on thresholds that are dependent on the learning rate and showed reduction in communication for distributed training. As compared to that work, we consider an adaptive threshold rather than selecting the same threshold across all parameters.…”
Section: Related Workmentioning
confidence: 99%
“…Hence the adaptive threshold makes our algorithm robust to different neural network models and different datasets. Our theoretical results are based on a generic bound on the threshold unlike [53] which provides a bound considering a certain form of threshold dependent on the learning rate. Further, we highlight the implementation challenges of event-triggered communication in an HPC environment which is different than the federated learning setting considered in [53] that usually involves wireless communication.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation