2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2021
DOI: 10.1109/hpca51647.2021.00057
|View full text |Cite
|
Sign up to set email alerts
|

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 28 publications
(14 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…We observe, however, that such parallel processing of multiple mini-batches invoke complex data hazards so ScratchPipe employs a novel hazard resolution mechanism to guarantee that the algorithmic nature of RecSys training is not altered. While not specifically focusing on recommendation models, there is a rich set of prior literature exploring heterogeneous memory systems for training large-scale ML algorithms [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. In general, the key contribution of our ScratchPipe is orthogonal to these prior studies.…”
Section: Related Workmentioning
confidence: 99%
“…We observe, however, that such parallel processing of multiple mini-batches invoke complex data hazards so ScratchPipe employs a novel hazard resolution mechanism to guarantee that the algorithmic nature of RecSys training is not altered. While not specifically focusing on recommendation models, there is a rich set of prior literature exploring heterogeneous memory systems for training large-scale ML algorithms [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49]. In general, the key contribution of our ScratchPipe is orthogonal to these prior studies.…”
Section: Related Workmentioning
confidence: 99%
“…The second approach uses compression techniques such as using low or mixed precision [16] for model training, saving on both model states and activations. The third approach uses an external memory such as the CPU memory as an extension of GPU memory to increase memory capacity during training [8,9,11,17,23,24,33].…”
Section: Background and Related Workmentioning
confidence: 99%
“…Heterogeneous DL training is a promising approach to reduce GPU memory requirement by exploiting CPU memory. Many efforts have been made in this direction [8,9,11,17,23,23,24,[32][33][34]. Nearly all of them target CNN based models, where activation memory is the memory bottleneck, and model size is fairly small (less than 500M).…”
Section: Introductionmentioning
confidence: 99%
“…Memory over-commitment in NN training. Prior work studies using storage or slow memory (e.g., host memory) as an extension of fast memory (e.g., GPU memory) to increase memory capacity for NN training (Rhu et al, 2016;Hildebrand et al, 2020;Huang et al, 2020;Peng et al, 2020;Jin et al, 2018;Ren et al, 2021). However, most of these works target at optimizing the conventional offline learning scenarios by swapping optimizer states, activations, or model weights between the fast memory and slow memory (or storage), whereas we focus on swapping samples in between episodic memory and storage to tackle the forgetting problem in the context of continual learning.…”
Section: Related Workmentioning
confidence: 99%