2024
DOI: 10.3390/app14198848
|View full text |Cite
|
Sign up to set email alerts
|

Check-QZP: A Lightweight Checkpoint Mechanism for Deep Learning Frameworks

Sangheon Lee,
Gyupin Moon,
Chanyong Lee
et al.

Abstract: In deep learning (DL) frameworks, a checkpoint operation is widely used to store intermediate variable values (e.g., weights, biases, and gradients) on storage media. This operation helps to reduce the recovery time of running a machine learning (ML) model after sudden power failures or random crashes. However, the checkpoint operation can stall the overall training step of the running model and waste expensive hardware resources by leaving the GPU in idle sleep during the checkpoint operation. In addition, th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 43 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?