Dynamic load balancing on single- and multi-GPU systems

Chen, Long; Villa, Oreste; Krishnamoorthy, Sriram; Gao, Guang R.

doi:10.1109/ipdps.2010.5470413

Cited by 106 publications

(65 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the approach does not scale very well for large numbers of threads. Using mapped memory, Chen and Villa [6] have introduced a concept which uses non-blocking taskqueues to implement a master-worker pattern, where the main CPU is able to generate tasks after a kernel has been launched. This approach is very well suited for scenarios where multiple GPUs have to be supplied with tasks.…”

Section: Non-blocking Task-queuesmentioning

confidence: 99%

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

Plauth

Feinbube

Schlegel

et al. 2016

IJNC

View full text Add to dashboard Cite

GPU compute devices have become very popular for general purpose computations. However, the SIMD-like hardware of graphics processors is currently not well suited for irregular workloads, like searching unbalanced trees. In order to mitigate this drawback, NVIDIA introduced an extension to GPU programming models called Dynamic Parallelism. This extension enables GPU programs to spawn new units of work directly on the GPU, allowing the refinement of subsequent work items based on intermediate results without any involvement of the main CPU.This work investigates methods for employing Dynamic Parallelism with the goal of improved workload distribution for tree search algorithms on modern GPU hardware. For the evaluation of the proposed approaches, a case study is conducted on the N-Queens problem. Extensive benchmarks indicate that the benefits of improved resource utilization fail to outweigh high management overhead and runtime limitations due to the very fine level of granularity of the investigated problem. However, novel memory management concepts for passing parameters to child grids are presented. These general concepts are applicable to other, more coarse-grained problems that benefit from the use of Dynamic Parallelism.

show abstract

Section: Non-blocking Task-queuesmentioning

confidence: 99%

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

Plauth

Feinbube

Schlegel

et al. 2016

IJNC

View full text Add to dashboard Cite

show abstract

“…It divides parallel computing tasks according to execution speed to achieve best overall system performance. In [2] a multi-GPU self-adaptive load balancing method was proposed. GPU can self-adaptively select tasks to execute according to local free-busy state by establishing task queue model between CPU and GPU.…”

Section: Related Researchesmentioning

confidence: 99%

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island

Cao

Wang

2017

ITM Web Conf.

View full text Add to dashboard Cite

Abstract. With the development of GPU general-purpose computing, GPU heterogeneous cluster has become a widely used parallel data processing solution in modern data center. Temperature management and controlling has become a new research hotspot in big data continuous computing. Temperature heat island in cluster has important influence on computing reliability and energy efficiency. In order to prevent the occurrence of GPU cluster temperature heat island, a big data task scheduling model for preventing temperature heat island was proposed. In this model, temperature, reliability and computing performance are taken into account to reduce node performance difference and improve throughput per unit time in cluster. Temperature heat islands caused by slow nodes are prevented by optimizing scheduling. The experimental results show that the proposed scheme can control node temperature and prevent the occurrence of temperature heat island under the premise of guaranteeing computing performance and reliability.

show abstract

“…StarPU [15] is designed to be a platform for heterogeneous task scheduling. Along with StarPU, Qilin [16], Scout [17], the dynamic load balancing system created by Chen et al [18], and the work by Jiménez et al [19] forms a solid foundation for both the need and the capability for a heterogeneous task scheduler. These solutions, however, require the user to reimplement their application -in a new programming language in the case of StarPU or Scout; a new API in Qilin -or manually to create multiple copies of a function for multiple platforms to provide to the scheduler.…”

Section: Related Workmentioning

confidence: 99%

Heterogeneous Task Scheduling for Accelerated OpenMP

Scogland

Rountree

Feng

et al. 2012

2012 IEEE 26th International Parallel and Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-Heterogeneous systems with CPUs and computational accelerators such as GPUs, FPGAs or the upcoming Intel MIC are becoming mainstream. In these systems, peak performance includes the performance of not just the CPUs but also all available accelerators. In spite of this fact, the majority of programming models for heterogeneous computing focus on only one of these. With the development of Accelerated OpenMP for GPUs, both from PGI and Cray, we have a clear path to extend traditional OpenMP applications incrementally to use GPUs. The extensions are geared toward switching from CPU parallelism to GPU parallelism. However they do not preserve the former while adding the latter. Thus computational potential is wasted since either the CPU cores or the GPU cores are left idle. Our goal is to create a runtime system that can intelligently divide an accelerated OpenMP region across all available resources automatically. This paper presents our proof-of-concept runtime system for dynamic task scheduling across CPUs and GPUs. Further, we motivate the addition of this system into the proposed OpenMP for Accelerators standard. Finally, we show that this option can produce as much as a two-fold performance improvement over using either the CPU or GPU alone.

show abstract

Dynamic load balancing on single- and multi-GPU systems

Cited by 106 publications

References 14 publications

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads

A GPU Heterogeneous Cluster Scheduling Model for Preventing Temperature Heat Island

Heterogeneous Task Scheduling for Accelerated OpenMP

Contact Info

Product

Resources

About