2018 IEEE International Symposium on Information Theory (ISIT) 2018
DOI: 10.1109/isit.2018.8437467
|View full text |Cite
|
Sign up to set email alerts
|

Improving Distributed Gradient Descent Using Reed-Solomon Codes

Abstract: Today's massively-sized datasets have made it necessary to often perform computations on them in a distributed manner. In principle, a computational task is divided into subtasks which are distributed over a cluster operated by a taskmaster. One issue faced in practice is the delay incurred due to the presence of slow machines, known as stragglers. Several schemes, including those based on replication, have been proposed in the literature to mitigate the effects of stragglers and more recently, those inspired … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

3
151
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 150 publications
(157 citation statements)
references
References 18 publications
3
151
0
1
Order By: Relevance
“…Each column of the encoding matrix B corresponds to a partition D i and is associated with a polynomial that evaluates to zero at the respective workers who have not been assigned that partition part. For more details the reader is referred to [2]. The matrixT = T · diag(w) is equal to T with its columns each scaled by the respective entry of w, thus T (1) = w. A direct consequence of this is that a T IB I = e T 1T =T (1) = w, which completes the proof.…”
Section: Weighted Gradient Codingmentioning
confidence: 73%
See 2 more Smart Citations
“…Each column of the encoding matrix B corresponds to a partition D i and is associated with a polynomial that evaluates to zero at the respective workers who have not been assigned that partition part. For more details the reader is referred to [2]. The matrixT = T · diag(w) is equal to T with its columns each scaled by the respective entry of w, thus T (1) = w. A direct consequence of this is that a T IB I = e T 1T =T (1) = w, which completes the proof.…”
Section: Weighted Gradient Codingmentioning
confidence: 73%
“…Gradient coding requires the central server to receive the subtasks of a fixed fraction of any of the workers. We obtain an extension based on balanced Reed-Solomon codes [2,14], introducing weighted gradient coding, where the central server recovers a weighted sum of the partial gradients of the loss function. .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The PS then updates θ θ θ i+1 , as well as θ θ θ i 1 = θ θ θ i and θ θ θ i m = θ θ θ i−1 m for m = 2, 3. The next iteration i + 1 then continues with a check of condition (13) by the PS in the same way.…”
Section: B Lazily Aggregated Gradient (Lag)mentioning
confidence: 99%
“…It was presented in part in ISIT'18 [1] [2], CWIT'19 [3], and ICML'19 [4]. computing gradients, thus accelerating the training of largescale machine learning applications [12], [13], [14], [15], [16]. While matrix multiplication and gradient descent are two types of coded computing problems that have been studied, others include coded convolution [17], coded approximate computing [18], sparse coded matrix multiplication [19], and heterogeneous coded computing [20].…”
Section: Introductionmentioning
confidence: 99%