2019
DOI: 10.1109/tit.2019.2926704
|View full text |Cite
|
Sign up to set email alerts
|

Near Optimal Coded Data Shuffling for Distributed Learning

Abstract: Data shuffling between distributed cluster of nodes is one of the critical steps in implementing large-scale learning algorithms. Randomly shuffling the data-set among a cluster of workers allows different nodes to obtain fresh data assignments at each learning epoch. This process has been shown to provide improvements in the learning process. However, the statistical benefits of distributed data shuffling come at the cost of extra communication overhead from the master node to worker nodes, and can act as one… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

1
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 25 publications
(19 citation statements)
references
References 52 publications
(105 reference statements)
1
18
0
Order By: Relevance
“…Attia and Tandon presented a theoretic lower bound on the communication overhead for data shuffling as a function of the number of workers, number of data points, and the available storage per node. They proposed a coded communication scheme to show that the communication overhead is within a multiplicative factor according to the theoretic lower bound . Elmahdy and Mohajer considered the data shuffling problem that a master node communicates a set of files to a set of worker nodes through a shared link.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Attia and Tandon presented a theoretic lower bound on the communication overhead for data shuffling as a function of the number of workers, number of data points, and the available storage per node. They proposed a coded communication scheme to show that the communication overhead is within a multiplicative factor according to the theoretic lower bound . Elmahdy and Mohajer considered the data shuffling problem that a master node communicates a set of files to a set of worker nodes through a shared link.…”
Section: Related Workmentioning
confidence: 99%
“…They proposed a coded communication scheme to show that the communication overhead is within a multiplicative factor according to the theoretic lower bound. 30 Elmahdy and Mohajer considered the data shuffling problem that a master node communicates a set of files to a set of worker nodes through a shared link. They proposed a deterministic and systematic coded shuffling scheme to find out the exact rate of cache files.…”
Section: Related Work Comparisonsmentioning
confidence: 99%
“…Inspired by the achievable and converse bounds for the single-bottleneck-link caching problem in [8]-[10], the authors in [11] then proposed a general coded data shuffling scheme, which was shown to be order optimality to within a factor of 2 under the constraint of uncoded storage. Also in [11], the authors improved the performance of the general coded shuffling scheme by introducing an aligned coded delivery, which was shown to be optimal under the constraint of uncoded storageRecently, inspired by the improved data shuffling scheme in [11], the authors in [12] proposed a linear coding scheme based on interference alignment, which achieves the optimal worstcase communication load under the constraint of uncoded storage for all system parameters. In addition, under the constraint of uncoded storage, the proposed coded data shuffling scheme in [12] was shown to be optimal for any shuffles (not just for the worst-case) when q = 1.…”
mentioning
confidence: 99%
“…Inspired by the achievable and converse bounds for the single-bottleneck-link caching problem in [8]-[10], the authors in [11] then proposed a general coded data shuffling scheme, which was shown to be order optimality to within a factor of 2 under the constraint of uncoded storage. Also in [11], the authors improved the performance of the general coded shuffling scheme by introducing an aligned coded delivery, which was shown to be optimal under the constraint of uncoded storage…”
mentioning
confidence: 99%
See 1 more Smart Citation