Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2021
DOI: 10.1145/3437801.3441593
|View full text |Cite
|
Sign up to set email alerts
|

Dapple

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 97 publications
(20 citation statements)
references
References 16 publications
0
20
0
Order By: Relevance
“…Pipeline parallelism splits a mini-batch into smaller microbatches and pipelines them to the DNN model stages hosted on different workers so that workers can process different micro-batches simultaneously [12], [17], [24], [25]. Point-topoint communication is performed between workers hosting neighbor stages to transfer intermediate activations.…”
Section: Distributed Dnn Trainingmentioning
confidence: 99%
See 3 more Smart Citations
“…Pipeline parallelism splits a mini-batch into smaller microbatches and pipelines them to the DNN model stages hosted on different workers so that workers can process different micro-batches simultaneously [12], [17], [24], [25]. Point-topoint communication is performed between workers hosting neighbor stages to transfer intermediate activations.…”
Section: Distributed Dnn Trainingmentioning
confidence: 99%
“…next iteration. Despite better model accuracy, pipeline flush causes worker idling (i.e., bubbles) in pipeline execution [5], [12], [24], [25]. For GPipe and 1F1B, the ratio of the bubble time is (p − 1)/(m + p − 1), where p is the number of stages and m is the number of micro-batches [5].…”
Section: Distributed Dnn Trainingmentioning
confidence: 99%
See 2 more Smart Citations
“…A recent line of work adds pipelines into model parallelism by partitioning model layers into parallel stages [7], [8], [45], [46], [47], [48], [49]. In this way, each training batch is divided into micro-batches to be processed by pipeline stages across computing devices.…”
Section: Further Analysis 1) Training Efficiencymentioning
confidence: 99%