Enabling preemptive multiprogramming on GPUs

Tanasic, Ivan; Gelado, Isaac; Cabezas, Javier; Ramírez, Alex; Navarro, Nacho; Valero, Mateo

doi:10.1145/2678373.2665702

Cited by 64 publications

(39 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Finally, GPUs were non-preemptive until recently, which means a long running GPU kernel cannot be preempted by software until it finishes. This will cause unfairness between multiple kernels and severely deteriorate the responsiveness of latency-critical kernels [Tanasic et al 2014]. Currently, a new GPU architecture to support GPU kernel preemption has emerged in the market [NVIDIA 2016a], but it is expected that existing GPUs will continue to suffer from this issue.…”

Section: Scheduling Methodsmentioning

confidence: 99%

GPU Virtualization and Scheduling Methods

2017

View full text Add to dashboard Cite

The integration of graphics processing units (GPUs) on high-end compute nodes has established a new accelerator-based heterogeneous computing model, which now permeates high performance computing. The same paradigm nevertheless has limited adoption in cloud computing or other large-scale distributed computing paradigms. Heterogeneous computing with GPUs can benefit the Cloud by reducing operational costs and improving resource and energy efficiency. However, such a paradigm shift would require effective methods for virtualizing GPUs, as well as other accelerators. In this survey paper, we present an extensive and in-depth survey of GPU virtualization techniques and their scheduling methods. We review a wide range of virtualization techniques implemented at the GPU library, driver, and hardware levels. Furthermore, we review GPU scheduling methods that address performance and fairness issues between multiple virtual machines sharing GPUs. We believe that our survey delivers a perspective on the challenges and opportunities for virtualization of heterogeneous computing environments.

show abstract

Section: Scheduling Methodsmentioning

confidence: 99%

GPU Virtualization and Scheduling Methods

2017

View full text Add to dashboard Cite

show abstract

“…To overcome this limitation, GPU architectures that support hardwarebased preemption were suggested, and finally preemptive GPUs have emerged in the market recently [4]. However, context switching in such GPUs is very expensive and recent research [5] reports that hardware-based preemption decreases the total throughput up to 35% in a wide range of GPU applications. Owing to this reason, we expect that certain HPC clouds will still employ non-preemptive GPUs for throughput-sensitive applications.…”

Section: Non-preemptive Schedulingmentioning

confidence: 99%

“…However, as a GPU is composed of massive computation cores each of which has its own context, the amount of data to be saved and restored at a single time reaches to several hundreds of KB. Unfortunately, this causes significant throughput degradation up to 35% in GPU applications [5]. Therefore, we believe that non-preemptive GPUs are still relevant for performance-sensitive applications, and addressing non-preemptive scheduling remains an important issue.…”

Section: Introductionmentioning

confidence: 99%

FairGV: Fair and Fast GPU Virtualization

Hong

Spence

Nikolopoulos

2017

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-Increasingly high performance computing (HPC) application developers are opting to use cloud resources due to higher availability. Virtualized GPUs would be an obvious and attractive option for HPC application developers using cloud hosting services. Unfortunately, existing GPU virtualization software is not ready to address fairness, utilization, and performance limitations associated with consolidating mixed HPC workloads. This paper presents FairGV, a radically redesigned GPU virtualization system that achieves system-wide weighted fair sharing and strong performance isolation in mixed workloads that use GPUs with variable degrees of intensity. To achieve its objectives, FairGV introduces a trap-less GPU processing architecture, a new fair queuing method integrated with work-conserving and GPU-centric coscheduling polices, and a collaborative scheduling method for non-preemptive GPUs. Our prototype implementation achieves near ideal fairness (≥ 0.97 Min-Max Ratio) with little performance degradation (≤ 1.02 aggregated overhead) in a range of mixed HPC workloads that leverage GPUs.

show abstract

“…Hardware support for preemption has been proposed for Nvidia GPUs, as well as SM-draining whereby workgroups occupying a symmetric multiprocessor (SM; a compute unit using our terminology) are allowed to complete until the SM becomes free for other tasks [28]. SM draining is limited the presence of blocking constructs, since it may not be possible to drain a blocked workgroup.…”

Section: Related Workmentioning

confidence: 99%

Cooperative kernels: GPU multitasking for blocking algorithms

Sorensen

Evrard

Donaldson

2017

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

View full text Add to dashboard Cite

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking.

show abstract

Enabling preemptive multiprogramming on GPUs

Cited by 64 publications

References 27 publications

GPU Virtualization and Scheduling Methods

GPU Virtualization and Scheduling Methods

FairGV: Fair and Fast GPU Virtualization

Cooperative kernels: GPU multitasking for blocking algorithms

Contact Info

Product

Resources

About