SC20: International Conference for High Performance Computing, Networking, Storage and Analysis 2020
DOI: 10.1109/sc41405.2020.00049
|View full text |Cite
|
Sign up to set email alerts
|

GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…Machine learning (ML) applications are playing an increasingly important role in modern high-performance computing (HPC) systems. Besides the optimization of gigantic neural network model training [3,8,17,25,29], HPC plus artificial intelligence (AI) in solving scientific problems are gaining momentum [5,20,22,23]. One example is the machinelearning molecular dynamics (MLMD), which aims to bridge the gap between first-principles accuracy and Newtonian MD efficiency [9,37].…”
Section: Introductionmentioning
confidence: 99%
“…Machine learning (ML) applications are playing an increasingly important role in modern high-performance computing (HPC) systems. Besides the optimization of gigantic neural network model training [3,8,17,25,29], HPC plus artificial intelligence (AI) in solving scientific problems are gaining momentum [5,20,22,23]. One example is the machinelearning molecular dynamics (MLMD), which aims to bridge the gap between first-principles accuracy and Newtonian MD efficiency [9,37].…”
Section: Introductionmentioning
confidence: 99%
“…It is a synchronous weight update technique that schedules backward passes of each micro-batch as early as possible to release the memory occupied by activations. Gems [Jain et al, 2020a] and Chimera [Li and Hoefler, 2021] implement bidirectional pipelines, where each GPU is used for two pipeline stages (i and P − i, P is the number of stages). The design of Gems is mostly concerned with activations memory: the forward pass of the next micro-batch starts after the first backward stage of the previous micro-batch is computed and activations memory is released.…”
Section: Offloading Of Weightsmentioning
confidence: 99%
“…Het-Pipe [Park et al, 2020] addresses the additional problem of heterogeneous GPUs by grouping them into virtual workers and running pipeline parallelism within each virtual worker, while relying on data parallelism between workers. Varuna [Athlur et al, 2021] [Park et al, 2020] DP, PP Parameter Server LinProg for PP Pipe-torch [Zhan and Zhang, 2019] DP, PP Async Update DynProg for DP, PP, GPU allocation Varuna [Athlur et al, 2021] DP, PP Opportunistic Backward Scheduling Heuristic PP partition, Bruteforce for DP, PP depth Gems [Jain et al, 2020a] DP, PP Bidirectional Pipeline -Chimera [Li and Hoefler, 2021] DP,PP 1F1B, Bidirectional Pipeline Greedy mini-batch size, Bruteforce for DP, PP depth tivation recomputations and respective backward passes are scheduled opportunistically.…”
Section: Several Papers Specifically Target Challenging Topologiesmentioning
confidence: 99%
“…GPU-Enabled Memory-Aware Model-Parallelism System (GEMS) has been proposed to train large-scale deep learning models using high-resolution images, which are mainly used in digital pathology [23]. In their paper, four types of techniques are proposed: GEMS-Basic, GEMS-MAST, GEMS-MASTER, and GEMS-Hybrid.…”
Section: Related Workmentioning
confidence: 99%