2022
DOI: 10.48550/arxiv.2208.02498
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Container-Based Workflow for Distributed Training of Deep Learning Algorithms in HPC Clusters

Abstract: Deep learning has been postulated as a solution for numerous problems in different branches of science. Given the resource-intensive nature of these models, they often need to be executed on specialized hardware such graphical processing units (GPUs) in a distributed manner. In the academic field, researchers get access to this kind of resources through High Performance Computing (HPC) clusters. This kind of infrastructures make the training of these models difficult due to their multi-user nature and limited … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 44 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?