2017 IEEE International Conference on Big Data (Big Data) 2017
DOI: 10.1109/bigdata.2017.8257926
|View full text |Cite
|
Sign up to set email alerts
|

ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…Many approaches, such as pipelining [38]- [41] and microbatching [42] are orthogonal to our 3D spatial partitioning. Others directly target memory pressure during training, but perform additional computation, including gradient accumulation [43], out-of-core algorithms [44]- [46], and recomputation [47], [48].…”
Section: Related Workmentioning
confidence: 99%
“…Many approaches, such as pipelining [38]- [41] and microbatching [42] are orthogonal to our 3D spatial partitioning. Others directly target memory pressure during training, but perform additional computation, including gradient accumulation [43], out-of-core algorithms [44]- [46], and recomputation [47], [48].…”
Section: Related Workmentioning
confidence: 99%
“…vDNN [51] is a memory that virtualizes GPU memory in DNN training. ooc cuDNN [25] extends cuDNN and applies cuDNN-compatible operators even when a layer exceeds GPU memory capacity by swapping at the granularity of individual tensor dimensions. Gradient checkpointing [10] reduces the memory needed to store the intermediate outputs and gradients with the cost of doubling the forward pass computational cost [10,26].…”
Section: Related Workmentioning
confidence: 99%
“…Several approaches to alleviating memory pressure on GPUs have been used. If at least one sample can fit in GPU memory, an out-of-core "micro-batching" approach, where mini-batches are split into micro-batches and updates accumulated, can be used, but this can increase training time [43]. Other approaches utilize recomputation to avoid keeping intermediate values [44].…”
Section: Related Workmentioning
confidence: 99%