Disengaged scheduling for fair, protected access to fast computational accelerators

Menychtas, Konstantinos; Shen, Kai; Scott, Michael L.

doi:10.1145/2644865.2541963

Cited by 6 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After launching the GPU kernel, the host thread continues its execution 3 . Right before it reaches the next waiting point, it gets into a while loop, in which, it repeatedly checks the KState until it becomes "DONE".…”

Section: Basic Implementation Of the Runtimementioning

confidence: 99%

“…Moreover, the default scheduling is oblivious to the priorities of kernels. Numerous studies [1][2][3][4] have shown that the problematic way to manage GPU causes serious unfairness, response delays, and low GPU utilizations.…”

Section: Introductionmentioning

confidence: 99%

“…The first is for trackability. They propose some APIs and OS intercepting techniques [1,3] to allow the OS or hypervisors to track the usage of GPU by each application. The improved trackability may help select GPU kernels to launch based on their past usage and priorities.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

EffiSha

Chen

Zhao

Shen

et al. 2017

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

View full text Add to dashboard Cite

Modern GPUs are broadly adopted in many multitasking environments, including data centers and smartphones. However, the current support for the scheduling of multiple GPU kernels (from different applications) is limited, forming a major barrier for GPU to meet many practical needs. This work for the first time demonstrates that on existing GPUs, efficient preemptive scheduling of GPU kernels is possible even without special hardware support. Specifically, it presents EffiSha, a pure software framework that enables preemptive scheduling of GPU kernels with very low overhead. The enabled preemptive scheduler offers flexible support of kernels of different priorities, and demonstrates significant potential for reducing the average turnaround time and improving the system overall throughput of programs that time share a modern GPU.

show abstract

Section: Basic Implementation Of the Runtimementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

EffiSha

Chen

Zhao

Shen

et al. 2017

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

View full text Add to dashboard Cite

show abstract

“…Table 1 shows the statistics generated by the agents. Rinnegan does not yet handle directly accessible accelerators, but disengaged schedulers [41] can be extended to implement the agent functionality for such devices. GPU agent.…”

Section: Accelerator Agentsmentioning

confidence: 99%

“…It may be up to the device driver for an accelerator to make scheduling decisions, which may not be coordinated with CPU scheduling. While there have been research systems such as PTask and others [25,41,56] that perform scheduling for tasks on a single processing unit, they are unable to select between multiple possible units for a task. Conversely, application runtimes for heterogeneous systems can run a task on different processing units [6,39], but support only static heterogeneity, where the performance of a processing unit does not vary over time.…”

Section: Introductionmentioning

confidence: 99%

Rinnegan

Panneerselvam

Swift

2016

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

View full text Add to dashboard Cite

Current processors provide a variety of different processing units to improve performance and power efficiency. For example, ARM's big.LITTLE, AMD's APUs, and Oracle's M7 provide heterogeneous processors, on-die GPUs, and on-die accelerators. However, the performance experienced by programs using these processing units can vary widely due to contention from multiprogramming, thermal constraints and other issues. In these systems, the decision of where to execute a task must consider not only execution time of the task, but also current system conditions. We built Rinnegan, a Linux kernel extension and runtime library, to perform scheduling and handle task placement in heterogeneous systems. The Rinnegan kernel extension monitors and reports the utilization of all processing units to applications, which then makes placement decisions at user level. The Rinnegan runtime provides a performance model to predict the speedup and overhead of offloading a task. With this model and the current utilization of processing units, the runtime can select the task placement that best achieves an application's performance goals, such as low latency, high throughput, or real-time deadlines. When integrated with StarPU, a runtime system for heterogeneous architectures, Rinnegan improves StarPU by performing 1.5-2x better than its native scheduling policies in a shared heterogeneous environment.

show abstract

GPU-aware resource management in heterogeneous cloud data centers

Kulkarni

Annappa

2021

J Supercomput

View full text Add to dashboard Cite

Disengaged scheduling for fair, protected access to fast computational accelerators

Cited by 6 publications

References 16 publications

EffiSha

EffiSha

Rinnegan

GPU-aware resource management in heterogeneous cloud data centers

Contact Info

Product

Resources

About