Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

Huchant, Pierre; Barthou, Denis; Counilh, Marie-Christine

doi:10.1109/cahpc.2018.8645867

Cited by 2 publications

(3 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To distribute the work to each processing unit, we rely on an adaptive partitioning algorithm [12]. The algorithm implemented in OPENCARP consists in the following steps:…”

Section: B Load Balancingmentioning

confidence: 99%

“…Furthermore, due to the lightweight nature of OPENCARP ionic models kernels and the wave-oriented layout of their execution, instead of a pure runtime approach balancing tasks among devices such as STARPU, we propose an hybrid compiler/runtime approach. Thus, we adapted the load balancing algorithm proposed by Huchant et al [12], [13] to match OPENCARP needs. At each computation iteration, this algorithm adapts the workload chunk size given to each device (if needed) according to the execution time of the previous iteration.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Performance Portability of Generated Cardiac Simulation Kernels Through Automatic Dimensioning and Load Balancing on Heterogeneous Nodes

Alba,

Aumage,

Barthou

et al. 2024

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Self Cite

View full text Add to dashboard Cite

Electrophysiology simulation applications, such as the community-developed OPENCARP framework for in-silico experiments, involve applying a broad range of ionic model kernels with different computational weights and arithmetic intensity characteristics. Efficiently processing such kernels on modern heterogeneous architectures necessitates to accurately dimension the set of computing resources to use and to actively balance the load on the available computing units, to account for discrepancies in kernel duration and distinct computing unit speeds. We thus propose the following contributions: 1) the adaptation of an existing load-balancing algorithm to transparently manage the mapping of these ionic model kernels onto the heterogeneous units of a computing node; 2) a resource dimensioning heuristic that constraints the number of devices that should be used to maximize efficiency, according to the selected ionic models' computational weight; 3) the integration of these mechanisms in OPENCARP, building on prior work that took advantage of LLVM's MLIR framework to generate multiple device-specialized variants of kernels from ionic models expressed in OPENCARP's high-level DSL; 4) a thorough experimentation of the mechanisms on a comprehensive series of 30 ionic models provided by OPENCARP. The experiments show that when using the combination of the load-balancing algorithm and the resource dimensioning heuristic to compute each ionic model, the geometric mean of speedup is 9.97× with respect to the original multi-threaded code on an architecture with two A100 GPUs and 2× 32-cores AMD Zen3 CPUs.

show abstract

“…To distribute the work to each processing unit, we rely on an adaptive partitioning algorithm [12]. The algorithm implemented in OPENCARP consists in the following steps:…”

Section: B Load Balancingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Performance Portability of Generated Cardiac Simulation Kernels Through Automatic Dimensioning and Load Balancing on Heterogeneous Nodes

Alba,

Aumage,

Barthou

et al. 2024

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Self Cite

View full text Add to dashboard Cite

show abstract

“…33 Therefore, we propose a generalizable approach to partition the workload on the function level. Other works that partition on the function level either require periodic re-evaluation of the current assignment to adjust to the irregularly evolving workload, 34 or they need to profile the hardware and program offline. 35 To reduce overhead, we propose a heuristic to dynamically trigger re-evaluation of the hardware assignment.…”

Section: Related Workmentioning

confidence: 99%

OpenABLext: An automatic code generation framework for agent‐based simulations on CPU‐GPU‐FPGA heterogeneous platforms

Xiao

Andelfinger

Cai

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

The execution of agent-based simulations (ABSs) on hardware accelerator devices such as graphics processing units (GPUs) has been shown to offer great performance potentials. However, in heterogeneous hardware environments, it can become increasingly difficult to find viable partitions of the simulation and provide implementations for different hardware devices. To automate this process, we present OpenABLext, an extension to OpenABL, a model specification language for ABSs. By providing a device-aware OpenCL backend, OpenABLext enables the co-execution of ABS on heterogeneous hardware platforms consisting of central processing units, GPUs, and field programmable gate arrays (FPGAs). We present a novel online dispatching method that efficiently profiles partitions of the simulation during run-time to optimize the hardware assignment while using the profiling results to advance the simulation itself. In addition, OpenABLext features automated conflict resolution based on user-specified rules, supports graph-based simulation spaces, and utilizes an efficient neighbor search algorithm. We show the improved performance of OpenABLext and demonstrate the potential of FPGAs in the context of ABS. We illustrate how co-execution can be used to further lower execution times. OpenABLext can be seen as an enabler to tap the computing power of heterogeneous hardware platforms for ABS.

show abstract

Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

Cited by 2 publications

References 10 publications

Performance Portability of Generated Cardiac Simulation Kernels Through Automatic Dimensioning and Load Balancing on Heterogeneous Nodes

Performance Portability of Generated Cardiac Simulation Kernels Through Automatic Dimensioning and Load Balancing on Heterogeneous Nodes

OpenABLext: An automatic code generation framework for agent‐based simulations on CPU‐GPU‐FPGA heterogeneous platforms

Contact Info

Product

Resources

About