Exploiting Co-execution with OneAPI: Heterogeneity from a Modern Perspective

Nozal, Raúl; Bosque, José Luis

doi:10.1007/978-3-030-85665-6_31

Cited by 11 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Coexecutor Runtime is presented, extending the work and preliminary results of the conference paper [29]. The key innovations are the high level API, increasing the abstraction but maintaining its compatibility and extensibility with SYCL; an efficient architectural design focused on preserving and reusing as many oneAPI primitives as possible while extending its functionality; and as far as we know, it is the first co-execution runtime for Intel oneAPI.…”

Section: Introductionmentioning

confidence: 84%

“…The experiments to validate the Coexecutor Runtime [29] ( https://github.com/oneAPIscheduling/CoexecutorRuntime, accessed on 28 September 2021) were carried out in two nodes, labeled Desktop and DevCloud. Desktop was a computer with an Intel Core i5-7500 Kaby Lake architecture processor, with four cores at 3400 MHz, one thread per core and three cache levels.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Straightforward Heterogeneous Computing with the oneAPI Coexecutor Runtime

Nozal

Bosque

2021

Electronics

View full text Add to dashboard Cite

Heterogeneous systems are the core architecture of most computing systems, from high-performance computing nodes to embedded devices, due to their excellent performance and energy efficiency. Efficiently programming these systems has become a major challenge due to the complexity of their architectures and the efforts required to provide them with co-execution capabilities that can fully exploit the applications. There are many proposals to simplify the programming and management of acceleration devices and multi-core CPUs. However, in many cases, portability and ease of use compromise the efficiency of different devices—even more so when co-executing. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using two heterogeneous systems composed of an integrated GPU and CPU. Static and dynamic load balancers are integrated and evaluated, highlighting single and co-execution strategies and the most significant key points of this promising technology. Experimental results show that co-execution is worthwhile when using dynamic algorithms and improves the efficiency even further when using unified shared memory.

show abstract

Section: Introductionmentioning

confidence: 84%

Section: Methodsmentioning

confidence: 99%

Straightforward Heterogeneous Computing with the oneAPI Coexecutor Runtime

Nozal

Bosque

2021

Electronics

View full text Add to dashboard Cite

show abstract

“…There are situations in which an algorithm performs better in one type of problem or device, and in other cases another one behaves much better. For example, an integrated GPU that supports computecommunication overlap via multiple queues, when faced with a program behavior like NBody, can benefit from algorithms that divide the load into many small packets [5,17], while a discrete accelerator faced with the execution of many short-lived kernels generally cannot amortize the management overhead, and is better suited to algorithms that exploit very large packets [18][19][20]. For this reason, it is necessary to provide an appropriate and more sophisticated load balancing algorithm that take into account the context of the simulation and the runtime system.…”

Section: Motivationmentioning

confidence: 99%

“…There have been different works related to combining heterogeneous programming models and technologies [1][2][3][4][5][6], but they usually provide explicit code inputs, isolation of technologies by tasks, focus only on CPU-GPU distribution or use non-OpenCL-based languages. Some works focus on providing load distribution for HPC simulation environments [1,[7][8][9][10][11][12][13], but most focus on distributed technologies in combination with shared memory.…”

Section: Introductionmentioning

confidence: 99%

Mashing load balancing algorithm to boost hybrid kernels in molecular dynamics simulations

Nozal

Bosque

2022

J Supercomput

Self Cite

View full text Add to dashboard Cite

The path to the efficient exploitation of molecular dynamics simulators is strongly driven by the increasingly intensive use of accelerators. However, they suffer performance portability issues, making it necessary both to achieve technological combinations that allow taking advantage of each programming model and device, and to define more effective load distribution strategies that consider the simulation conditions. In this work, a new load balancing algorithm is presented, together with a set of optimizations to support hybrid co-execution in a runtime system for heterogeneous computing. The new extended design enables the exploitation of custom kernels and acceleration technologies altogether, being encapsulated for the rest of the runtime and its scheduling system. With this support, Mash algorithm allows to simultaneously leverage different workload distribution strategies, benefiting from the most advantageous one per device and technology. Experiments show that these proposals achieve an efficiency close to 0.90 and an energy efficiency improvement around 1.80 over the original optimized version.

show abstract