Generic Parallel Programming Using C++ Templates and Skeletons

Bischof, Holger; Gorlatch, Sergei; Leshchinskiy, Roman

doi:10.1007/978-3-540-25935-0_7

Cited by 12 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Some approaches such as DatTel [25], Parallel Standard Template Library (PSTL) [26] and Standard Template Adaptive Parallel Library (STAPL) [27] aim to provide parallel extensions to the Standard Template Library (STL) [28] in C++. DatTel is a data-parallel template library that overloads certain STL functions.…”

Section: Related Workmentioning

confidence: 99%

Parallel Iterator for Parallelizing Object-Oriented Applications

Giacaman

Sinnen

2010

Int J Parallel Prog

View full text Add to dashboard Cite

With the advent of multi-core processors, desktop application developers must finally face parallel computing and its challenges. A large portion of the computational load in a program rests within iterative computations. In objectoriented languages these are commonly handled using iterators which are inadequate for parallel programming. This paper presents a powerful Parallel Iterator concept to be used in object-oriented programs for the parallel traversal of a collection of elements. The Parallel Iterator may be used with any collection type (even those inherently sequential) and it supports several scheduling schemes which may even be decided dynamically at run-time. Some additional features are provided to allow early termination of parallel loops, exception handling and a solution for performing reductions. With a slight contract modification, the Parallel Iterator interface imitates that of the Java-style sequential iterator. All these features combine together to promote minimal, if any, code restructuring. Along with the ease of use, the results reveal negligible overhead and the expected inherent speedup.

show abstract

Section: Related Workmentioning

confidence: 99%

Parallel Iterator for Parallelizing Object-Oriented Applications

Giacaman

Sinnen

2010

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…The skeletons are implemented in Eden as higher-order functions, see [10,11,12]. Related skeleton libraries in other languages were presented in [13,14,15,16,17,18].…”

Section: Skeletons In Edenmentioning

confidence: 99%

Parallel Computation Skeletons with Premature Termination Property

Lobachev

2012

Functional and Logic Programming

View full text Add to dashboard Cite

Abstract. A parallel computation with early termination property is a special form of a parallel for loop. This paper devises a generic highlevel approach for such computation which is expressed as a scheme for algorithmic skeletons. We call this scheme map+reduce, in similarity with the map-reduce paradigm. The implementation is concise and relies heavily on laziness. Two case studies from computational number theory support our presentation.Algorithmic skeletons are parallel algorithm abstractions, which concentrate on the parallelisation and not on the algorithm [1]. We regard here skeletal implementation of a parallel computation with early termination. An imperative equivalent is a for−break parallel loop. An overview is in Section 1. This paper discusses an implementation of this kind of a loop in lazy parallel functional programming language Eden [2]. We will present this language briefly in Section 2. We devise a classification of new and existing related skeletons in Section 3. We develop there a more special approach. The desired parallel behaviour has a wellknown functional counterpart: a map-reduce combination with added possibility of premature termination. This premature abort part is new. We find some features of the Eden language useful in this context, as the premature termination is granted for free in our setting. We discuss it in Section 4. Section 5 highlights two primality tests used as examples. We do not discuss their implementation in full detail and focus on parallelisation results. Section 6 discusses further related work. Section 7 concludes and outlines future work.Contributions. The contributions of this paper are -Definition of the map+reduce skeleton scheme for the parallel computation with premature termination. -Discussion of the importance of laziness in the context of speculative parallelisation and premature termination. -High-level parallelisation of two primality tests. We were the first to parallelise one of them, the Jacobi sum test.

show abstract

“…For instance, STAPL [1] provides parallel containers as well as parallel iterators (called pRanges) [20]. HPC++ Parallel Standard Template Library (PSTL) [9], Charm++ [10] and DaTel [3] supply comparable functionalities. However, these libraries still require an awareness of the underlying architecture.…”

Section: Related Workmentioning

confidence: 99%

Multi-target C++ implementation of parallel skeletons

Kirschenmann

Plagne

Vialle

2009

Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing

View full text Add to dashboard Cite

This paper presents the design of an efficient multi-target (CPU+GPU) implementation for the Parallel_for skeleton. Emerging massively parallel architectures promise very high performances for a low cost. However, these architectures change faster than ever. Thus, optimization of codes becomes a very complex and time consumming task. We have identified the data storage as the main difference between the CPU and the GPU implementation of a code. We introduce an abstract data layout in order to adapt the data storage. Based on this layout, the utilization of Parallel_for skeleton allows to compile and execute the same program both on CPU and on GPU. Once compiled, the program runs close to the hardware limits. Categories and Subject Descriptors General TermsC++ templates, parallel computing, Nvidia CUDA, Intel TBB, parallel skeletons, data layout MOTIVATION AND MAIN OBJECTIVESIn many scientific applications, computation time is a strong constraint. Optimizing for the rapidly changing computer hardware is a very expensive and time consuming task. Emerging hybrid architectures tend to make this process even more complex.The classical way to ease this optimization process is to build applications on top of High Performance Computing (HPC) libraries. Each HPC library allows the scientific developer to use a well defined Application Programming Interface (API) tailored for its specific scientific sub-domain. Because of their limited scope, it is possible to produce specialized HPC implementations of these libraries for a large variety of hardware target architectures. Hence, scientific Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Legolas++ is a generic library developed at EDF R&D that provides building blocks for the specific domain of Highly Structured Sparse Linear Algebra (HSSLA) problems arising in many simulation codes. In particular, it allows to deal with recursively blocked matrices (matrix of blocks of blocks of...) that appear for example in neutron transport simulations [16]. In order to build HPC codes meeting EDF's industrial quality standards, a multi-target version of Legolas++ is presently developed that should provide a unified interface for both multi-core CPUs and Graphics Processing Units (GPUs) optimal implementations. Not all but a large fraction of the Legolas++ operations are embarrassingly parallel and consist in applying the same function independently on multiple data. This kind of problems are well described with a Parallel_for algorithm that is an instance of parallel algorithm skeletons introduced in [4]. In this article we propose a design for a C++ multi-target (CPU/GPU) implementation of the Parallel_for skeleton. Thi...

show abstract

Generic Parallel Programming Using C++ Templates and Skeletons

Cited by 12 publications

References 13 publications

Parallel Iterator for Parallelizing Object-Oriented Applications

Parallel Iterator for Parallelizing Object-Oriented Applications

Parallel Computation Skeletons with Premature Termination Property

Multi-target C++ implementation of parallel skeletons

Contact Info

Product

Resources

About