2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2023
DOI: 10.1109/ipdps54959.2023.00031
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient 2D Method for Training Super-Large Deep Learning Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…Model parallelism splits the model across multiple GPUs, each handling different stages. The model parallelism includes two categories: pipeline parallelism [42,72,83], placing individual layers on single GPUs, and tensor parallelism [28,30,45], dividing each tensor into chunks for specific GPUs.…”
Section: Distributed Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…Model parallelism splits the model across multiple GPUs, each handling different stages. The model parallelism includes two categories: pipeline parallelism [42,72,83], placing individual layers on single GPUs, and tensor parallelism [28,30,45], dividing each tensor into chunks for specific GPUs.…”
Section: Distributed Trainingmentioning
confidence: 99%
“…On the other side, the rapid growth in the memory requirements of large-scale DNN models [2,75] has sparked the development of methods at the system-and algorithm-level to alleviate memory demands. Examples for these methods include recomputation [40,86], offloading [69], distributed training [28,30,42,45,72,83] and low-rank adaptation [29]. Even though these optimizations can effectively reduce memory footprint for training or fine-tuning large-scale DNN models, they may lead to poor memory utilization.…”
Section: Introductionmentioning
confidence: 99%