Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL

Ferrer, Roger; Planas, Judit; Bellens, Pieter; Durán, Alejandro; González, Marc; Martorell, Xavier; Badía, Rosa M.; Ayguadé, Eduard; Labarta, Jesús

doi:10.1007/978-3-642-19595-2_15

Cited by 25 publications

(18 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ferrer et al [17] proposed OmpSs, a programming model based on OpenMP and StarSs, which can also incorporate the use of OpenCL or CUDA kernels. They evaluated their model with four benchmarks on three different types of hardware platforms (Intel Xeon Server, Cell/B.E., Nvidia GPUs), and compared the results obtained with the execution of the same benchmarks written in OpenCL.…”

Section: Related Workmentioning

confidence: 99%

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Shen

Fang

Sips

et al. 2012

2012 41st International Conference on Parallel Processing Workshops

View full text Add to dashboard Cite

Abstract-OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They are also fundamentally different in their approach to parallelization. In this paper, we focus on comparing the performance of OpenCL and OpenMP. We select three applications from the Rodinia benchmark suite (which provides equivalent OpenMP and OpenCL implementations), and carry out experiments with different datasets on three multi-core platforms. We see that the incorrect usage of the multi-core CPUs, the inherent OpenCL fine-grained parallelism, and the immature OpenCL compilers are the main reasons that lead to the OpenCL poorer performance. After tuning the OpenCL versions to be more CPUfriendly, we show that OpenCL either outperforms or achieves similar performance in more than 80% of the cases. Therefore, we believe that OpenCL is a good alternative for multi-core CPU programming.

show abstract

Section: Related Workmentioning

confidence: 99%

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Shen

Fang

Sips

et al. 2012

2012 41st International Conference on Parallel Processing Workshops

View full text Add to dashboard Cite

show abstract

“…In the case of GPGPUs those (low-level) libraries include Brook [13], NVidia CUDA, and OpenCL. At a higher-level, Offload [14] enables offloading of parts of a C++ application, which are wrapped in offload blocks, onto hardware accelerators for asynchronous execution; OMPSs [15] enables the offloading of OpenCL and CUDA kernels as an OpenMP extension [16]. FastFlow, in contrast with these frameworks, does not target specific (hardware) accelerators but realizes a virtual accelerator running on the main CPUs and thus does not require the development of specific code.…”

Section: Related Workmentioning

confidence: 99%

Accelerating Code on Multi-cores with FastFlow

Aldinucci

Danelutto

Kilpatrick

et al. 2011

Euro-Par 2011 Parallel Processing

View full text Add to dashboard Cite

FastFlow is a programming framework specifically targeting cache-coherent shared-memory multicores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications.

show abstract

“…We think that gathering all such information will easily excess the capabilities of the analysis tool for large runs and we would like to tackle this issue by rethinking which of the generated events are really required and which could be optional. Additionally, we are also working on a version of Nanos++ for distributed-memory systems [9] and accelerators [12]. We plan to extend the instrumentation mechanism in order to support these versions of the runtime.…”

Section: Discussionmentioning

confidence: 99%

On the Instrumentation of OpenMP and OmpSs Tasking Constructs

Servat

Teruel

Llort

et al. 2013

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Parallelism has become more and more commonplace with the advent of the multicore processors. Although different parallel programming models have arisen to exploit the computing capabilities of such processors, developing applications that take benefit of these processors may not be easy. And what is worse, the performance achieved by the parallel version of the application may not be what the developer expected, as a result of a dubious utilization of the resources offered by the processor.We present in this paper a fruitful synergy of a shared memory parallel compiler and runtime, and a performance extraction library. The objective of this work is not only to reduce the performance analysis life-cycle when doing the parallelization of an application, but also to extend the analysis experience of the parallel application by incorporating data that is only known in the compiler and runtime side. Additionally we present performance results obtained with the execution of instrumented application and evaluate the overhead of the instrumentation.

show abstract

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL

Cited by 25 publications

References 11 publications

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Accelerating Code on Multi-cores with FastFlow

On the Instrumentation of OpenMP and OmpSs Tasking Constructs

Contact Info

Product

Resources

About