2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) 2016
DOI: 10.1109/micro.2016.7783721
|View full text |Cite
|
Sign up to set email alerts
|

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design

Abstract: The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simult… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
204
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 275 publications
(212 citation statements)
references
References 28 publications
(49 reference statements)
3
204
0
Order By: Relevance
“…As such, the (GBs of) NPU local memory will be large enough to preserve tens of preempted task's context state. If the multiple checkpointed state oversubscribes NPU memory, the approach taken by Rhu et al [39] can similarly be employed to handle memory oversubscription via copying overflowing data to the CPU memory. Concretely, when the runtime observes that NPU memory usage is nearing its limit, the DMA unit can proactively migrate some of the checkpointed state from NPU to CPU memory while the inference request is being serviced to hide migration overhead.…”
Section: G Storage Overhead Of Preemptionmentioning
confidence: 99%
“…As such, the (GBs of) NPU local memory will be large enough to preserve tens of preempted task's context state. If the multiple checkpointed state oversubscribes NPU memory, the approach taken by Rhu et al [39] can similarly be employed to handle memory oversubscription via copying overflowing data to the CPU memory. Concretely, when the runtime observes that NPU memory usage is nearing its limit, the DMA unit can proactively migrate some of the checkpointed state from NPU to CPU memory while the inference request is being serviced to hide migration overhead.…”
Section: G Storage Overhead Of Preemptionmentioning
confidence: 99%
“…Memory-overlaying for DNN Virtual Memory. We implemented the runtime memory management policy as described in [9], [30], [10], [52], which leverages the network DAG to analyze inter-layer data dependency to schedule memory-overlaying operations for virtual memory. Under our implementation, the device memory is utilized as an application-level cache with respect to the host memory.…”
Section: Methodsmentioning
confidence: 99%
“…memory usage of DNNs [9], [10], [13], [14], [15], [16] have proposed to utilize both host and device memory concurrently for allocating data structures for DNN training. By leveraging the user-level DNN topology graph as means to extract a compile-time data dependency information (which is encapsulated as a direct acyclic graph (DAG) data structure) of the memory-hungry data structures, e.g., feature maps (X) and/or weights (W), DNN virtual memory can leverage this data dependency information to derive the DNN data reuse distance to schedule performance-aware data copy operations via memory-overlaying across host and device memory via PCIe [27], [28], [29].…”
Section: B Virtualizing Memory For Deep Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…The first approach involves using the data-swapping method which is proposed in this paper. M. N. Rhu et al [7] and Meng et al [8] also used this approach. They used popular neural networks such as ResNet50 for evaluation and basically focused on the increase in batch size.…”
Section: Related Workmentioning
confidence: 99%