Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Li, Songze; Kalan, Seyed Mohammadreza Mousavi; Avestimehr, A. Salman; Soltanolkotabi, Mahdi

doi:10.1109/ipdpsw.2018.00137

Cited by 88 publications

(81 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3) The heterogeneity of computation, storage and communication capabilities across different devices brings unique system challenges to tame latency for on-device distributed training, e.g., the stragglers (i.e., devices that run slow) may cause significant delays [8], [17]. 4) The arbitrarily adversarial behaviors of the devices (e.g,.…”

Section: Introductionmentioning

confidence: 99%

Federated Learning via Over-the-Air Computation

Yang

Jiang

Shi

et al. 2020

IEEE Trans. Wireless Commun.

863

615

View full text Add to dashboard Cite

The stringent requirements for low-latency and privacy of the emerging high-stake applications with intelligent devices such as drones and smart vehicles make the cloud computing inapplicable in these scenarios. Instead, edge machine learning becomes increasingly attractive for performing training and inference directly at network edges without sending data to a centralized data center. This stimulates a nascent field termed as federated learning for training a machine learning model on computation, storage, energy and bandwidth limited mobile devices in a distributed manner. To preserve data privacy and address the issues of unbalanced and non-IID data points across different devices, the federated averaging algorithm has been proposed for global model aggregation by computing the weighted average of locally updated model at each selected device. However, the limited communication bandwidth becomes the main bottleneck for aggregating the locally computed updates. We thus propose a novel over-the-air computation based approach for fast global model aggregation via exploring the superposition property of a wireless multiple-access channel. This is achieved by joint device selection and beamforming design, which is modeled as a sparse and low-rank optimization problem to support efficient algorithms design. To achieve this goal, we provide a differenceof-convex-functions (DC) representation for the sparse and lowrank function to enhance sparsity and accurately detect the fixed-rank constraint in the procedure of device selection. A DC algorithm is further developed to solve the resulting DC program with global convergence guarantees. The algorithmic advantages and admirable performance of the proposed methodologies are demonstrated through extensive numerical results.

show abstract

Section: Introductionmentioning

confidence: 99%

Federated Learning via Over-the-Air Computation

Yang

Jiang

Shi

et al. 2020

IEEE Trans. Wireless Commun.

863

615

View full text Add to dashboard Cite

show abstract

“…Since err F (E) does not depend on the specific set of stragglers, but only the size of it, we get (5) from (17)…”

Section: Appendix a Matrix Inversion Lemmamentioning

confidence: 99%

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

Kadhe

Koyluoglu

Ramchandran

2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, suffer from slow running machines, called stragglers. Gradient coding is a coding-theoretic framework to mitigate stragglers by enabling the server to recover the gradient sum in the presence of stragglers. Approximate gradient codes are variants of gradient codes that reduce computation and storage overhead per worker by allowing the server to approximately reconstruct the gradient sum.In this work, our goal is to construct approximate gradient codes that are resilient to stragglers selected by a computationally unbounded adversary. Our motivation for constructing codes to mitigate adversarial stragglers stems from the challenge of tackling stragglers in massive-scale elastic and serverless systems, wherein it is difficult to statistically model stragglers. Towards this end, we propose a class of approximate gradient codes based on balanced incomplete block designs (BIBDs). We show that the approximation error for these codes depends only on the number of stragglers, and thus, adversarial straggler selection has no advantage over random selection. In addition, the proposed codes admit computationally efficient decoding at the server. Next, to characterize fundamental limits of adversarial straggling, we consider the notion of adversarial threshold -the smallest number of workers that an adversary must straggle to inflict certain approximation error. We compute a lower bound on the adversarial threshold, and show that codes based on symmetric BIBDs maximize this lower bound among a wide class of codes, making them excellent candidates for mitigating adversarial stragglers.

show abstract

“…Uncoded distributed computation with MMC (UC-MMC) is introduced in [5,13,14], and is shown to outperform coded computation in terms of average completion time, concluding that coded computation is more effective against persistent stragglers, and particularly when full gradient is required at each iteration. Coded GD strategies are mainly designed for full gradient computation; and hence, the master needs to wait until all the gradients can be recovered.…”

Section: Introductionmentioning

confidence: 99%

Distributed Gradient Descent with Coded Partial Gradient Computations

Ozfatura

Ulukuş

Gündüz

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Coded computation techniques provide robustness against straggling servers in distributed computing, with the following limitations: First, they increase decoding complexity. Second, they ignore computations carried out by straggling servers; and they are typically designed to recover the full gradient, and thus, cannot provide a balance between the accuracy of the gradient and per-iteration completion time.Here we introduce a hybrid approach, called coded partial gradient computation (CPGC), that benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and decoding complexity.

show abstract

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Cited by 88 publications

References 22 publications

Federated Learning via Over-the-Air Computation

Federated Learning via Over-the-Air Computation

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

Distributed Gradient Descent with Coded Partial Gradient Computations

Contact Info

Product

Resources

About