Classification-Driven Search for Effective SM Partitioning in Multitasking GPUs

Zhao, Xia; Wang, Zhiying; Eeckhout, Lieven

doi:10.1145/3205289.3205311

Cited by 21 publications

(8 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of the proposed framework for dynamic optimizations is evaluated with the baseline with evenly partitioned spatial multitasking approach [9], [20], [21] and CD-search [13]. Fig.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Optimizations in GPU Using Roofline Model

Thomas

Toraskar

Singh

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

View full text Add to dashboard Cite

Massively parallel processors such as graphics processesing units (GPUs) often face the challenge of resource underutilization due to varying resource proclivity of workloads. Running multiple applications on a GPU has been an efficient and known alternative to mitigate underutilization. This paper proposes a multi-application oriented framework that carries out dynamic optimizations based on the operational intensities of various applications. Our framework analyzes applications based on operational intensities to identify their bottleneck resources using Roofline model. We demonstrate that the proposed optimizations improve the utilization and system-wide throughput of the GPU co-running applications with irregular resource demands. The dynamic optimizations improve the performance by 14.8% on average and up to 72.4% over a state-of-the-art spatial multitasking technique.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Sharing of resources within the core and rest of the off-chip memoryresources have been explored in [12], [11]. We use spatial multitasking for optimizations and compare its performance with the state-of-the-art technique classification driven search (CD-search) [13].…”

Section: Introductionmentioning

confidence: 99%

Dynamic Optimizations in GPU Using Roofline Model

Thomas

Toraskar

Singh

2021

2021 IEEE International Symposium on Circuits and Systems (ISCAS)

View full text Add to dashboard Cite

show abstract

“…However, if a kernel completes its execution, we end the current epoch and start a new one with an even SM allocation because behavior can be very different across kernels. To reduce the preemption overhead, we adaptively choose between a draining versus context switching policy [5,26]. If, during the epoch, the number of TBs finished on one SM is larger than the number of TBs that can be concurrently executed on one SM, we follow a draining policy; if not, we adopt the context switching policy.…”

Section: Implementing Hsm-based Sm Allocationmentioning

confidence: 99%

“…Managing SMs among concurrent applications in multitasking GPUs received significant attention recently [3,4,14,15,26,42]. These approaches indirectly infer the performance impact of a particular SM allocation; HSM, in contrast, predicts the performance impact of a particular SM allocation.…”

Section: Related Workmentioning

confidence: 99%

HSM

Zhao

Jahre

Eeckhout

2020

Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Syste

Self Cite

View full text Add to dashboard Cite

Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways -leading to suboptimal resource utilization when a single GPU is used to run a single application. One solution is to use the GPU in a multitasking fashion to improve utilization. Unfortunately, multitasking leads to destructive interference between co-running applications which causes fairness issues and Quality-of-Service (QoS) violations.We propose the Hybrid Slowdown Model (HSM) to dynamically and accurately predict application slowdown due to interference. HSM overcomes the low accuracy of prior white-box models, and training and implementation overheads of pure black-box models, with a hybrid approach. More specifically, the white-box component of HSM builds upon the fundamental insight that effective bandwidth utilization is proportional to DRAM row buffer hit rate, and the black-box component of HSM uses linear regression to relate row buffer hit rate to performance. HSM accurately predicts application slowdown with an average error of 6.8%, a significant improvement over the current state-of-the-art. In addition, we use HSM to guide various resource management schemes in multitasking GPUs: HSM-Fair significantly improves fairness (by 1.59× on average) compared to even partitioning, whereas HSM-QoS improves system throughput (by 18.9% on average) compared to proportional SM partitioning while maintaining the QoS target for the high-priority application in challenging mixed memory/compute-bound multi-program workloads.

show abstract

“…Graphics Processing Units (GPUs) are throughputoriented co-processors that are witnessing a rapid increase in the amount of computing resources. To avoid keeping these growing resources underutilized and improve performance, concurrent kernel execution (CKE) has been proposed and showed improved GPU throughput and resource utilization [1], [2], [3].…”

Section: Introductionmentioning

confidence: 99%

Algorithms for Preemptive Co-scheduling of Kernels on GPUs

Eyraud-Dubois

Bentes

2020

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

View full text Add to dashboard Cite

Modern GPUs allow concurrent kernel execution and preemption to improve hardware utilization and responsiveness. Currently, the decision on the simultaneous execution of kernels is performed by the hardware, which can lead to unreasonable use of resources. In this work, we tackle the problem of co-scheduling for GPUs in high competition scenarios. We propose a novel graphbased preemptive co-scheduling algorithm, with the focus on reducing the number of preemptions. We show that the optimal preemptive makespan can be computed by solving a Linear Program in polynomial time. Based on this solution we propose graph theoretical model and an algorithm to build preemptive schedules which minimizes the number of preemptions. We show, however, that finding the minimal amount of preemptions among all preemptive solutions of optimal makespan is a NP-hard problem. We performed experiments on real-world GPU applications and our approach can achieve optimal makespan by preempting 6 to 9% of the tasks.

show abstract

Classification-Driven Search for Effective SM Partitioning in Multitasking GPUs

Cited by 21 publications

References 30 publications

Dynamic Optimizations in GPU Using Roofline Model

Dynamic Optimizations in GPU Using Roofline Model

HSM

Algorithms for Preemptive Co-scheduling of Kernels on GPUs

Contact Info

Product

Resources

About