Scalable GPU Virtualization with Dynamic Sharing of Graphics Memory Space

Xue, Mochi; Ma, Jiacheng; Li, Wentai; Tian, Kun; Dong, Yang; Wu, Jiankang; Qi, Zhengwei; He, Bingsheng; Guan, Haibing

doi:10.1109/tpds.2018.2789883

Cited by 14 publications

(12 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e gVirt allows VMs to directly access resources that have a large effect on performance and make other privileged operations be intervened through a hypervisor. Due to the restriction on the number of simultaneous VMs in the gVirt, we modified the original gVirt over Xen hypervisor (XenGT) and added the gScale's scalability features [20]. roughout this paper, we consider the modified gVirt as gVirt.…”

Section: Overview Of Gvirtmentioning

confidence: 99%

“…However, if the vGPU is scheduled out, the CPU cannot access the vGPU through aperture. To solve this problem, the gScale allows the CPU to access the vGPU space at all times through the fence memory space pool, which ensures the proper operation of ladder mapping for mapping the guest physical address to the host physical address [20].…”

Section: Scientific Programmingmentioning

confidence: 99%

“…In current gVirt, 64 MB low global graphics memory and 384 MB high global graphics memory are recommended for Linux VM [20] because those memory sizes are enough to support most GPU workloads without performance degradation and crash from the experiments. However, we showed in a previous study [22] that large high global graphics memory can sometimes increase the performance of GPU workloads.…”

Section: Motivationmentioning

confidence: 99%

“…In Figure 4, when the GPU memory of vGPU3 should be reduced by one slot, reducing slot 4 rather than slot 5 can minimize the GPU memory sharing among the VMs. us, the policy for minimizing the GPU memory sharing can maximize the effect of the predictive-copy technique [21] that copies the GTT entries in advance by predicting the next scheduled vGPU before the GPU context switch. e detailed algorithms for memory expansion and reduction are presented below.…”

Section: Gpu Memory Adjustment Algorithmmentioning

confidence: 99%

“…As a result, a single host could create up to only a maximum of three VMs. e gScale [20,21] solved this scalability problem by dividing the graphics memory into multiple small slots and letting VMs share the slots. Private and physical graphics translation tables (GTTs) are used to translate the virtual address generated by each VM into the physical address for the graphics memory.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Ballooning Graphics Memory Space in Full GPU Virtualization Environments

Park

2019

Scientific Programming

View full text Add to dashboard Cite

Advances in virtualization technology have enabled multiple virtual machines (VMs) to share resources in a physical machine (PM). With the widespread use of graphics-intensive applications, such as two-dimensional (2D) or 3D rendering, many graphics processing unit (GPU) virtualization solutions have been proposed to provide high-performance GPU services in a virtualized environment. Although elasticity is one of the major benefits in this environment, the allocation of GPU memory is still static in the sense that after the GPU memory is allocated to a VM, it is not possible to change the memory size at runtime. This causes underutilization of GPU memory or performance degradation of a GPU application due to the lack of GPU memory when an application requires a large amount of GPU memory. In this paper, we propose a GPU memory ballooning solution called gBalloon that dynamically adjusts the GPU memory size at runtime according to the GPU memory requirement of each VM and the GPU memory sharing overhead. The gBalloon extends the GPU memory size of a VM by detecting performance degradation due to the lack of GPU memory. The gBalloon also reduces the GPU memory size when the overcommitted or underutilized GPU memory of a VM creates additional overhead for the GPU context switch or the CPU load due to GPU memory sharing among the VMs. We implemented the gBalloon by modifying the gVirt, a full GPU virtualization solution for Intel’s integrated GPUs. Benchmarking results show that the gBalloon dynamically adjusts the GPU memory size at runtime, which improves the performance by up to 8% against the gVirt with 384 MB of high global graphics memory and 32% against the gVirt with 1024 MB of high global graphics memory.

show abstract

Section: Overview Of Gvirtmentioning

confidence: 99%

Section: Scientific Programmingmentioning

confidence: 99%

Section: Motivationmentioning

confidence: 99%

Section: Gpu Memory Adjustment Algorithmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Ballooning Graphics Memory Space in Full GPU Virtualization Environments

Park

2019

Scientific Programming

View full text Add to dashboard Cite

show abstract

Adaptive and transparent task scheduling of GPU‐powered clusters

Yang

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

GPGPU-powered supercomputers are vital for various science and engineering applications. On each cluster node, the GPU works as a coprocessor of the CPU, and the computing task runs alternatively on CPU and GPU. Due to this characteristic, traditional task scheduling strategy tends to result in significant workload imbalance and underutilization of GPUs. We design an adaptive scheduling strategy to alleviate such imbalance and underutilization. Our strategy proposes to logically treats all GPUs in the cluster as a whole. Every cluster node maintains a local information table of all GPUs. Once a GPU call request is received, a node selects a GPU to run the task in an adaptive manner based on this table. In addition, our strategy does not rely on a global queue, and thus avoids excessive internode communication overhead. Moreover, we encapsulate our strategy into an intermedia module between the cluster and users. Consequently, underlying details of task scheduling is transparent to users, which enhances usability. We validate our strategy through experiments.

show abstract