2018 IEEE International Symposium on Information Theory (ISIT) 2018
DOI: 10.1109/isit.2018.8437549
|View full text |Cite
|
Sign up to set email alerts
|

Straggler-Proofing Massive-Scale Distributed Matrix Multiplication with D-Dimensional Product Codes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
55
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 72 publications
(56 citation statements)
references
References 9 publications
1
55
0
Order By: Relevance
“…The scheme in [13] requires an additional decoding phase, and assume the existence of a powerful master that can store the entire product C in memory and decode for the missing blocks using the redundant chunks. This is also true for the other schemes in [14]- [16]. Moreover, these schemes would fail when the number of stragglers is more than the provisioned redundancy while OverSketch has a 'graceful degradation' as one can get away by ignoring more workers than provisioned at the cost of accuracy of the result.…”
Section: Comparison With Existing Straggler Mitigation Schemesmentioning
confidence: 98%
See 1 more Smart Citation
“…The scheme in [13] requires an additional decoding phase, and assume the existence of a powerful master that can store the entire product C in memory and decode for the missing blocks using the redundant chunks. This is also true for the other schemes in [14]- [16]. Moreover, these schemes would fail when the number of stragglers is more than the provisioned redundancy while OverSketch has a 'graceful degradation' as one can get away by ignoring more workers than provisioned at the cost of accuracy of the result.…”
Section: Comparison With Existing Straggler Mitigation Schemesmentioning
confidence: 98%
“…A simpler version of this has been known in the HPC community as Algorithm-Based-Fault-Tolerance (ABFT) [18]. Authors in [14] generalize the results in [13] to a d-dimensional product code with only one parity in each dimension. In [15], the authors develop polynomial codes for matrix multiplication, which is an improvement over [13] in terms of recovery threshold, that is, the minimum number of workers required to recover the product C.…”
Section: B Related Workmentioning
confidence: 99%
“…MDS codes however have the disadvantage of having high encoding and decoding complexity, which could be restricting in setups with large number of workers. [2] attacks at this problem presenting a coded computation scheme based on d-dimensional product codes. [14] presents a scheme referred to as polynomial codes for coded matrix multiplication with input matrices from a large finite field.…”
Section: B Related Workmentioning
confidence: 99%
“…Coding has been applied to distributed fog computing and machine learning for dealing with the problem of stragglers [4] and reducing the usage of computation and communication resources [5]. For coded distributed machine learning [2], matrix multiplication [6], [7] and gradient descent [8], [9] have attracted considerable attention.…”
Section: Introductionmentioning
confidence: 99%
“…However, in a large-scale network with thousands of nodes, these MDS-based codes become impractical [1], [7] because of high computation and communication costs associated with encoding and decoding [13]. In addition, most of the previous works consider the master-worker pattern.…”
Section: Introductionmentioning
confidence: 99%