2020
DOI: 10.48550/arxiv.2006.09616
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dynamic Tensor Rematerialization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(29 citation statements)
references
References 0 publications
0
29
0
Order By: Relevance
“…Techniques for training large-scale models. In addition to parallelization, there are other complementary techniques for training large-scale models, such as memory optimization [8,10,17,19,24,42], communication compression [4,48], and low-precision training [31]. Alpa can incorporate many of these techniques.…”
Section: Related Workmentioning
confidence: 99%
“…Techniques for training large-scale models. In addition to parallelization, there are other complementary techniques for training large-scale models, such as memory optimization [8,10,17,19,24,42], communication compression [4,48], and low-precision training [31]. Alpa can incorporate many of these techniques.…”
Section: Related Workmentioning
confidence: 99%
“…For example, [Jain et al, 2020b] proposed an Integer Linear Program (ILP) to find the optimal rematerialization strategy suitable for an arbitrary Directed Acyclic Graph (DAG) structure. [Kirisame et al, 2020] presented a cheap dynamic heuristic DTR that relies on scores encouraging to discard (i) heavy tensors (ii) with a long lifetime and (iii) that can be easily recomputed.…”
Section: Rematerialization Of Activationsmentioning
confidence: 99%
“…Gradient checkpointing [18,19,20,21] trades computation for memory by dropping some of the activations in the forward pass and recomputing them in the backward pass. Swapping [22,23,24,25,26] utilizes the huge amount of available CPU memory by swapping tensors between CPU and GPU.…”
Section: Memory-efficient Training Systemsmentioning
confidence: 99%
“…We implement highly-optimized Table 1: Usability Comparison of Memory Saving Systems. The systems include Checkmate [19], DTR [21],…”
Section: Activation Compressed Layersmentioning
confidence: 99%
See 1 more Smart Citation