Performance-aware component composition for GPU-based systems

Dastgeer, Usman

doi:10.3384/diss.diva-104310

Cited by 7 publications

(15 citation statements)

References 123 publications

(158 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, a platform description language such as XPDL [14] can be used to guide the automated selection of specializations of user functions and user-defined multibackend components [6].…”

Section: Discussionmentioning

confidence: 99%

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

Ernstsson

Keßler

2017

Int J Parallel Prog

View full text Add to dashboard Cite

In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the sourceto-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.

show abstract

Section: Discussionmentioning

confidence: 99%

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

Ernstsson

Keßler

2017

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…Each skeleton has back-ends (implementations) for sequential C, OpenMP, OpenCL, CUDA, and multi-GPU OpenCL as well as CUDA. For more information on SkePU and skeleton programming see [45,46,86,85,8,28,29]. Some examples of C++ code making use of the SkePU library are given in Section 11.4 and in Section C.3.1.…”

Section: Motivationmentioning

confidence: 99%

“…The PhD thesis [29] addresses issues associated with efficiently programming modern heterogeneous GPU-based systems. The described SkePU library is a skeleton programming library that makes intelligent implementation decisions -at compile time or runtime -while providing high-level abstractions.…”

Section: Motivationmentioning

confidence: 99%

“…Skeleton programming is a style of programming that makes use of high-level structures called skeletons that are basically higher order functions that models a complex computational scheme. SkePU [45,46,86,85,8,28,29] is a well-maintained C++-based library supporting skeletons such as Map, MapReduce, MapArray, MapOverlap, Scan, Reduce, and Generate. SkePU allows for execution of the skeletons on various architectures, including multi-core.…”

Section: Introductionmentioning

confidence: 99%

“…A SkePU [45,46,86,85,8,28,29] skeleton is a predefined, generic component that implements a common specific (parallel) pattern of computation and data dependence. Skeletons provide a high degree of abstraction and portability and a skeleton can be customized with user code.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Contributions to Simulation of Modelica Models on Data-Parallel Multi-Core Architectures

Stavåker

2015

Linköping Studies in Science and Technology. Dissertations

View full text Add to dashboard Cite

Modelica is an object-oriented, equation-based modeling and simulation language being developed through an international effort by the Modelica Association. With Modelica it is possible to build computationally demanding models; however, simulating such models might take a considerable amount of time. Therefore techniques of utilizing parallel multi-core architectures for faster simulations are desirable. In this thesis the topic of simulation of Modelica on parallel architectures in general and on graphics processing units (GPUs) in particular is explored. GPUs support code that can be executed in a data-parallel fashion. It is also possible to connect and run several GPUs together which opens opportunities for even more parallelism. In this thesis several approaches regarding simulation of Modelica models on GPUs and multi-core architectures are explored.In this thesis the topic of expressing and solving partial differential equations (PDEs) in the context of Modelica is also explored, since such models usually give rise to equation systems with a regular structure, which can be suitable for efficient solution on GPUs. Constructs for PDE-based modeling are currently not part of the standard Modelica language specification. Several approaches on modeling and simulation with PDEs in the context of Modelica have been developed over the years. In this thesis we present selected earlier work, ongoing work and planned work on PDEs in the context of Modelica. Some approaches detailed in this thesis are: extending the language specification with PDE handling; using a software with support for PDEs and automatic discretization of PDEs; and connecting an external C++ PDE library via the functional mockup interface (FMI).Finally the topic of parallel skeletons in the context of Modelica is explored. A skeleton is a predefined, generic component that implements a common specific pattern of computation and data dependence. Skeletons provide a high degree of abstraction and portability and a skeleton can be customized with user code. Using skeletons with Modelica opens up the possibility of executing heavy Modelica-based matrix and vector computations on multi-core architectures. A working Modelica-SkePU library with some minor necessary compiler extensions is presented.This work has been supported by the European ITEA2 OPENPROD project (Open Model-Driven Whole-Product Development and Simulation Environment), the European ITEA3 MODRIO project (Model Driven Physical Systems Operation) and by the National Graduate School of Computer Science (CUGS) 1 2 Populärvetenskaplig sammanfattningModelicaär ett objektorienterat, ekvationsbaserat modellerings-och simuleringsspråk som utvecklas via den internationella organisationen the Modelica Association. Med Modelicaär det möjligt att bygga beräkningskrävande modeller vilket kan leda till långa simuleringstider. Därförär metoder för att utnyttja parallella flerkärniga arkitekturer för snabbare simuleringarönskvärda. I denna avhandling utforskas området simulering av Modelicamod...

show abstract

Smart Containers and Skeleton Programming for GPU-Based Systems

Dastgeer

Keßler

2015

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

In this paper, we discuss the role, design and implementation of smart containers in the SkePU skeleton library for GPU-based systems. These containers provide an interface similar to C++ STL containers but internally perform runtime optimization of data transfers and runtime memory management for their operand data on the different memory units. We discuss how these containers can help in achieving asynchronous execution for skeleton calls while providing implicit synchronization capabilities in a data consistent manner. Furthermore, we discuss the limitations of the original, already optimizing memory management mechanism implemented in SkePU containers, and propose and implement a new mechanism that provides stronger data consistency and improves performance by reducing communication and memory allocations. With several applications, we show that our new mechanism can achieve significantly (up to 33.4 times) better performance than the initial mechanism for page-locked memory on a multi-GPU based system.Keywords SkePU · Smart containers · Skeleton programming · Memory management · Runtime optimizations · GPU-based systems 1 Introduction Skeleton programming [4] for GPU-based systems is increasingly becoming popular for mapping common computational patterns. Several skeleton libraries are especially written (from scratch) targeting GPU-based systems including SkePU [10, 6], SkelCL [24] and Marrow [20]. Moreover, many existing skeleton libraries, initially written for execution on MPI-clusters and/or multicore CPUs have been ported for GPU execution, such as FastFlow [12] and Muesli [11]. These libraries differ in their 2 Usman Dastgeer, Christoph Kessler approach and feature offering but they all aim to provide performance comparable to hand-written code while requiring much less programming effort.Providing high-level abstraction with good execution performance in a library requires special design consideration. The question comes down to what is exposed to the programmer and what is handled implicitly by the skeleton library. For example, the Marrow library exposes concurrency to the application program by executing skeleton calls asynchronously; it returns a handle which can be used to synchronize execution when needed. This allows Marrow to effectively overlap computation and communication from different skeleton computations. SkelCL makes data distribution explicit so that the application programmer can choose how to map a computation to the underlying computing platform.Another important aspect in GPU computation is managing communication between CPU (main) memory and GPU (device) memory over PCIe interconnect. In Muesli, FastFlow, SkePU and SkelCL, skeleton calls can execute on a single or multicore CPU as well as on a GPU. Considering that CPUs and GPUs have separate physical memory, execution on a certain compute device may require transferring data back and forth to its associated memory if data is not already available in that memory. For example, in the following code, // 1 D arrays : v0 , v1 skel_c...

show abstract

Performance-aware component composition for GPU-based systems

Cited by 7 publications

References 123 publications

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems

Contributions to Simulation of Modelica Models on Data-Parallel Multi-Core Architectures

Smart Containers and Skeleton Programming for GPU-Based Systems

Contact Info

Product

Resources

About