Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs

Olivier, Stephen Lecler; Prins, Jan F.

doi:10.1007/s10766-010-0140-7

Cited by 33 publications

(22 citation statements)

References 19 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…There are a lot of comparative studies between OpenMP and other CPU specific programming models [11], [12], and also some relevant work on the comparison between CUDA and OpenCL [13], [14]. However, detailed studies on OpenMP and OpenCL are rarely seen.…”

Section: Related Workmentioning

confidence: 99%

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Shen

Fang

Sips

et al. 2012

2012 41st International Conference on Parallel Processing Workshops

View full text Add to dashboard Cite

Abstract-OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They are also fundamentally different in their approach to parallelization. In this paper, we focus on comparing the performance of OpenCL and OpenMP. We select three applications from the Rodinia benchmark suite (which provides equivalent OpenMP and OpenCL implementations), and carry out experiments with different datasets on three multi-core platforms. We see that the incorrect usage of the multi-core CPUs, the inherent OpenCL fine-grained parallelism, and the immature OpenCL compilers are the main reasons that lead to the OpenCL poorer performance. After tuning the OpenCL versions to be more CPUfriendly, we show that OpenCL either outperforms or achieves similar performance in more than 80% of the cases. Therefore, we believe that OpenCL is a good alternative for multi-core CPU programming.

show abstract

Section: Related Workmentioning

confidence: 99%

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Shen

Fang

Sips

et al. 2012

2012 41st International Conference on Parallel Processing Workshops

View full text Add to dashboard Cite

show abstract

“…The manuscript precedes the inclusion of the depend clause in OpenMP 4.0, and thus it does not cover the implicit dependencies it enables, critical for our work. This latter observation also pertains to [25], which compares task parallelism under several parallel frameworks based on explicit synchronizations, including OpenMP 3.0 and TBB, which is the backend for the two libraries we have tested.…”

Section: Related Workmentioning

confidence: 81%

A Comparison of Task Parallel Frameworks based on Implicit Dependencies in Multi-core Environments

Fraguela

2017

Proceedings of the 50th Hawaii International Conference on System Sciences (2017)

View full text Add to dashboard Cite

Abstract-The larger flexibility that task parallelism offers with respect to data parallelism comes at the cost of a higher complexity due to the variety of tasks and the arbitrary patterns of dependences that they can exhibit. These dependencies should be expressed not only correctly, but optimally, i.e. avoiding over-constraints, in order to obtain the maximum performance from the underlying hardware. There have been many proposals to facilitate this non-trivial task, particularly within the scope of nowadays ubiquitous multi-core architectures. A very interesting family of solutions because of their large scope of application, ease of use and potential performance are those in which the user declares the dependences of each task, and lets the parallel programming framework figure out which are the concrete dependences that appear at runtime and schedule accordingly the parallel tasks. Nevertheless, as far as we know, there are no comparative studies of them that help users identify their relative advantages. In this paper we describe and evaluate four tools of this class discussing the strengths and weaknesses we have found in their use.

show abstract

“…We used the UTS [2], [3], [5] benchmark as the irregular and imbalance workloads in the experiments. Olivier and Prins developed the UTS benchmark using OpenMP [6] for shared memory computers and using UPC [7] for both shared memory and distributed memory computers.…”

Section: Related Workmentioning

confidence: 99%

“…Because this region of a shared stack may be accessed concurrently by local and remote threads, the threads require locking so that we must introduce an additional overhead. Work aggregation [3], [5] and multiple work stealing strategies share the idea of paying off overheads. Work aggregation uses a task-chunking technique designed to increase granularity when creating tasks.…”

Section: Related Workmentioning

confidence: 99%

“…In StackThreads/MP, idle workers steal tasks from the bottommost stack. We found that this strategy, which is a work stealing strategy of the original StackThreads/MP, results in a large steal overhead in the program of the Unbalanced Tree Search (UTS) [2], [3] benchmark. In such a case, the steal overhead increases rapidly when the number of processors is more than two.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing

Adnan

Sato

2012

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYLazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.

show abstract

Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs

Cited by 33 publications

References 19 publications

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

A Comparison of Task Parallel Frameworks based on Implicit Dependencies in Multi-core Environments

Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing

Contact Info

Product

Resources

About