Speeding up distributed machine learning using codes

Lee, Kangwook; Lam, Maximilian; Pedarsani, Ramtin; Papailiopoulos, Dimitris S.; Ramchandran, Kannan

doi:10.1109/isit.2016.7541478

Cited by 317 publications

(817 citation statements)

References 30 publications

(26 reference statements)

Supporting

Mentioning

813

Contrasting

Order By: Relevance

“…There is also an emerging body of work on using replication or erasure coding to mitigate stragglers in linear computations, such as matrix-vector multiplication (Dutta et al 2016;Lee et al 2016;Mallick et al 2018) and matrix-matrix multiplication (Yang et al 2017;Yu et al 2017), and machine learning (Ferdinand and Draper 2016;Tandon et al 2017). Our work is for general (possibly nonlinear) computations for which coding techniques cannot be directly applied, and we have to resort to simpler task replication strategies.…”

Section: Related Prior Workmentioning

confidence: 99%

Untitled

2019

ACM Trans. Model. Perform. Eval. Comput. Syst.

View full text Add to dashboard Cite

In a cloud computing job with many parallel tasks, the tasks on the slowest machines (straggling tasks) become the bottleneck in the job completion. Computing frameworks such as MapReduce and Spark tackle this by replicating the straggling tasks and waiting for any one copy to finish. Despite being adopted in practice, there is little analysis of how replication affects the latency and the cost of additional computing resources. In this article, we provide a framework to analyze this latency-cost tradeoff and find the best replication strategy by answering design questions, such as (1) when to replicate straggling tasks, (2) how many replicas to launch, and (3) whether to kill the original copy or not. Our analysis reveals that for certain execution time distributions, a small amount of task replication can drastically reduce both latency and the cost of computing resources. We also propose an algorithm to estimate the latency and cost based on the empirical distribution of task execution time. Evaluations using samples in the Google Cluster Trace suggest further latency and cost reduction compared to the existing replication strategy used in MapReduce.

show abstract

Section: Related Prior Workmentioning

confidence: 99%

Untitled

2019

ACM Trans. Model. Perform. Eval. Comput. Syst.

View full text Add to dashboard Cite

show abstract

“…A wealth of straggler avoidance techniques have been proposed in recent years for DGD as well as other distributed computation tasks [ 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 ]. The common design notion behind all these schemes is the assignment of redundant computations/tasks to workers, such that faster workers can compensate for the stragglers.…”

Section: Introductionmentioning

confidence: 99%

Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off

Ozfatura

Ulukuş

Gündüz

2020

Entropy

View full text Add to dashboard Cite

When gradient descent (GD) is scaled to many parallel workers for large-scale machine learning applications, its per-iteration computation time is limited by straggling workers. Straggling workers can be tolerated by assigning redundant computations and/or coding across data and computations, but in most existing schemes, each non-straggling worker transmits one message per iteration to the parameter server (PS) after completing all its computations. Imposing such a limitation results in two drawbacks: over-computation due to inaccurate prediction of the straggling behavior, and under-utilization due to discarding partial computations carried out by stragglers. To overcome these drawbacks, we consider multi-message communication (MMC) by allowing multiple computations to be conveyed from each worker per iteration, and propose novel straggler avoidance techniques for both coded computation and coded communication with MMC. We analyze how the proposed designs can be employed efficiently to seek a balance between the computation and communication latency. Furthermore, we identify the advantages and disadvantages of these designs in different settings through extensive simulations, both model-based and real implementation on Amazon EC2 servers, and demonstrate that proposed schemes with MMC can help improve upon existing straggler avoidance schemes.

show abstract

“…The broadcast rate or the rate of an index code is the ratio of the code length to the length of each of the messages. The problem of designing index codes with smallest possible broadcast rate is significant because of its applications, such as multimedia content delivery [3], coded caching [4], distributed computation [5], and also because of its relation to network coding [6], [7] and coding for distributed storage [8], [9].…”

Section: Introductionmentioning

confidence: 99%

Index Codes with Minimum Locality for Three Receiver Unicast Problems

Joy

Natarajan

2020

2020 National Conference on Communications (NCC)

View full text Add to dashboard Cite

An index code for a broadcast channel with receiver side information is locally decodable if every receiver can decode its demand using only a subset of the codeword symbols transmitted by the sender instead of observing the entire codeword. Local decodability in index coding improves the error performance when used in wireless broadcast channels, reduces the receiver complexity and improves privacy in index coding. The locality of an index code is the ratio of the number of codeword symbols used by each receiver to the number message symbols demanded by the receiver. Prior work on locality in index coding have considered only single unicast and singleuniprior problems, and the optimal trade-off between broadcast rate and locality is known only for a few cases. In this paper we identify the optimal broadcast rate (including among non-linear codes) for all three receiver unicast problems when the locality is equal to the minimum possible value, i.e., equal to one. The index code that achieves this optimal rate is based on a clique covering technique and is well known. The main contribution of this paper is in providing tight converse results by relating locality to broadcast rate, and showing that this known index coding scheme is optimal when locality is equal to one. Towards this we derive several structural properties of the side information graphs of three receiver unicast problems, and combine them with information theoretic arguments to arrive at a converse. 1 Single uniprior problems are index coding problems where every message is available as side information at a unique receiver.

show abstract

Speeding up distributed machine learning using codes

Cited by 317 publications

References 30 publications

Untitled

Untitled

Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off

Index Codes with Minimum Locality for Three Receiver Unicast Problems

Contact Info

Product

Resources

About