2022
DOI: 10.48550/arxiv.2205.00119
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud

Abstract: Existing general purpose frameworks for gigantic model training, i.e., models with billions to trillions of parameters, cannot scale efficiently on public cloud environments due to large communication overheads. In this paper, we propose MiCS, which Minimizes the Communication Scale to bring down communication overhead. Specifically, by decreasing the number of participants in a communication collective, MiCS can utilize existing heterogeneous network bandwidth on the cloud, reduce network traffic over slower … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…AI researchers determined that the Performance of AI Systems is provided via three (03) powers: Hardware (providing computational power), Software (including hard code, algorithms, AI models and frameworks), and Data (providing the workpiece for AI) [33,34,35,36]. For this reason, "three of the main areas where significant innovation will be traced back to AI are hardware, software, and data."…”
Section: Performance Of Ai Systemsmentioning
confidence: 99%
“…AI researchers determined that the Performance of AI Systems is provided via three (03) powers: Hardware (providing computational power), Software (including hard code, algorithms, AI models and frameworks), and Data (providing the workpiece for AI) [33,34,35,36]. For this reason, "three of the main areas where significant innovation will be traced back to AI are hardware, software, and data."…”
Section: Performance Of Ai Systemsmentioning
confidence: 99%