Supporting multiple accelerators in high-level programming models

Lin, Pei-Hung; Liao, Chunhua; Supinski, Bronis R. de; Quinlan, Daniel J.

doi:10.1145/2712386.2712405

Cited by 14 publications

(11 citation statements)

References 17 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The multi-GPU problem has been investigated in literature. For example, the approaches proposed in [21,20] suggest to extend OpenMP to support multiple accelerators in a seamless way. OpenACC has runtime functions to support the utilization of multiple GPU, however, lacks GPU to GPU data transfer, in single node [19] and multinode [9].…”

Section: State Of the Artmentioning

confidence: 99%

Exploiting OpenMP and OpenACC to accelerate a geometric approach to molecular docking in heterogeneous HPC nodes

et al. 2019

View full text Add to dashboard Cite

In drug discovery, molecular docking is the task in charge of estimating the position of a molecule when interacting with the docking site. This task is usually used to perform screening of a large library of molecules, in the early phase of the process. Given the amount of candidate molecules and the complexity of the application, this task is usually performed using High-Performance Computing (HPC) platforms. In modern HPC systems, heterogeneous platforms provide a better throughput with respect to homogeneous platforms.In this work, we ported and optimized a molecular docking application to a heterogeneous system, with one or more GPU accelerators, leveraging a hybrid OpenMP and OpenACC approach. We prove that our approach has a better exploitation of the node compared to pure CPU/GPU data splitting approaches, reaching a throughput improvement up to 36% while considering the same computing node.

show abstract

Section: State Of the Artmentioning

confidence: 99%

Exploiting OpenMP and OpenACC to accelerate a geometric approach to molecular docking in heterogeneous HPC nodes

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Yan et al [22] extend the OpenMP [4] and OpenACC [20] pragmas to support scheduling entire kernel executions on different devices. By using annotations, Yan et al aim to modify existing code, thus reducing the barrier to uptake and integration with existing code bases.…”

Section: Scheduling Of Kernelsmentioning

confidence: 99%

“…The extensions by Yan et al [22] build on existing work to support specifying how data should be partitioned between different hardware devices during a computation:…”

Section: Data Partitioningmentioning

confidence: 99%

See 1 more Smart Citation

A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform

Harvey

Bakanov

Spence

et al. 2016

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers

View full text Add to dashboard Cite

Exascale computation is the next target of high performance computing. In the push to create exascale computing platforms, simply increasing the number of hardware devices is not an acceptable option given the limitations of power consumption, heat dissipation, and programming models which are designed for current hardware platforms. Instead, new hardware technologies, coupled with improved programming abstractions and more autonomous runtime systems, are required to achieve this goal.This position paper presents the design of a new runtime for a new heterogeneous hardware platform being developed to explore energy efficient, high performance computing. By combining a number of different technologies, this framework will both simplify the programming of current and future HPC applications, as well as automating the scheduling of data and computation across this new hardware platform. In particular, this work explores the use of FPGAs to achieve both the power and performance goals of exascale, as well as utilising the runtime to automatically effect dynamic configuration and reconfiguration of these platforms.

show abstract

“…• A range of new methods to fairly compare the efficiency of server architectures (Section VI) and scale these architectures on demand to meet workload QoS requirements [6], [7]. NanoStreams advances the state of the art in micro-servers in several ways by: (a) adding application-specific but programmable hardware accelerators to micro-servers, as opposed to existing solutions that use elaborate hardware design flows and target a single algorithm [8]; (b) providing general purpose low latency networking to access accelerators in the datacentre, as opposed to custom fabrics [9]; (c) effectively integrating streaming and accelerator-aware programming models into domain specific software stacks, moving one step ahead of ongoing efforts to unify heterogeneous programming models [10]; (d) significantly improving server energy-efficiency of micro-servers via on demand and QoS-aware scale-out and acceleration [6], [7].…”

Section: Introductionmentioning

confidence: 99%