2018
DOI: 10.1007/s11227-017-2231-3
|View full text |Cite
|
Sign up to set email alerts
|

Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus

Abstract: The aim of this paper is to evaluate OpenMP, TBB and Cilk Plus as basic language-based tools for simple and efficient parallelization of recursively defined computational problems and other problems that need both task and data parallelization techniques. We show how to use these models of parallel programming to transform a source code of Adaptive Simpson's Integration to programs that can utilize multiple cores of modern processors. Using the example of Belman-Ford algorithm for solving single-source shortes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 10 publications
0
7
0
Order By: Relevance
“…Conversely, all Cilk++, TBB and CUDA graphs require some refactoring from the code for different reasons: (a) Cilk++ does not provide data-flow dependencies, but full synchronizations instead; (b) TBB decouples the description of the graph from its execution, and requires specific functions for starting the graph and joining results; and (c) CUDA graphs provide a lowlevel API that forces programmers to manage data copies and point-to-point synchronizations. The performance comparison between these models is out of the scope of this paper, but several works have already tackled this topic showing performance results for OpenMP competitive to the other parallel models [12,17].…”
Section: The Tdg: a Door For Expanding Portabilitymentioning
confidence: 99%
“…Conversely, all Cilk++, TBB and CUDA graphs require some refactoring from the code for different reasons: (a) Cilk++ does not provide data-flow dependencies, but full synchronizations instead; (b) TBB decouples the description of the graph from its execution, and requires specific functions for starting the graph and joining results; and (c) CUDA graphs provide a lowlevel API that forces programmers to manage data copies and point-to-point synchronizations. The performance comparison between these models is out of the scope of this paper, but several works have already tackled this topic showing performance results for OpenMP competitive to the other parallel models [12,17].…”
Section: The Tdg: a Door For Expanding Portabilitymentioning
confidence: 99%
“…Also, some works use OpenMP task pragmas for parallelization. 33,52,63 TA B L E 5 Parallelization language/library used on Phi MPI 4,9,12,16,22,24,25,28,35,36,43,44,50,54,59,60,63,66,76,82,86 Others Pthreads, 11,23,76,84,91,95 Intel TBB, 8,48 Cilk Plus, 48 OpenCL 49,100 Chatzikonstantis et al 28 study inferior-olivary nucleus (InfOli) simulation which is used in brain modeling. They accelerate the simulation using (i) MPI, (ii) OpenMP, and (iii) hybrid MPI+OpenMP.…”
Section: Hou Et Al 88 Present a Technique For Automatically Generatinmentioning
confidence: 99%
“…,[12][13][14][16][17][18]20,21,24,[27][28][29][30][33][34][35][36][37]39,[42][43][44][45][46][47][48]50,52,55,57,59,63,66,84,86,90,92,93,95,99 IntelMKL 2,17,19,31,32,40,93,99 …”
mentioning
confidence: 99%
“…Our OpenMP (version 3.1) implementation of this method achieved very good speedup on Intel Xeon CPUs (up to 5.06) and Intel Xeon Phi (up to 29.45). While this approach can be further improved using more sophisticated vectorization techniques such as the use of intrinsics [ 1 , 8 , 11 ], it will result in a loss of portability between different architectures. OpenACC is a standard for accelerated computing [ 2 , 6 ].…”
Section: Introductionmentioning
confidence: 99%