Proceedings of the 19th Annual International Conference on Supercomputing 2005
DOI: 10.1145/1088149.1088166
|View full text |Cite
|
Sign up to set email alerts
|

Automatic thread distribution for nested parallelism in OpenMP

Abstract: OpenMP is becoming the standard programming model for shared-memory parallel architectures. One of its most interesting features in the language is the support for nested parallelism. Previous research and parallelization experiences have shown the benefits of using nested parallelism as an alternative to combining several programming models such as MPI and OpenMP. However, all these works rely on the manual definition of an appropriate distribution of all the available thread across the different levels of pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
32
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(32 citation statements)
references
References 16 publications
0
32
0
Order By: Relevance
“…Binding, however, is non-portable from the performance point of view. In order to favor affinities in a more portable manner, the NANOS compiler [DGC05,AGMJ04] allows to associate groups of threads with parallel regions in a static way. The OpenUH Compiler [CHJ + 06] proposes a mecanism to accurately select the threads of a subteam, although this proposition does not involve nested parallelism.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Binding, however, is non-portable from the performance point of view. In order to favor affinities in a more portable manner, the NANOS compiler [DGC05,AGMJ04] allows to associate groups of threads with parallel regions in a static way. The OpenUH Compiler [CHJ + 06] proposes a mecanism to accurately select the threads of a subteam, although this proposition does not involve nested parallelism.…”
Section: Related Workmentioning
confidence: 99%
“…As quoted in a proposal for task parallelism in OpenMP [ACD + 07]: "The overhead associated with the creation of parallel regions, the varying levels of support in different implementations, the limits to the total number of threads in the application and to the allowed levels of parallelism, and the impossibility of controlling load balancing, make this approach impractical". Moreover, most advanced OpenMP compilers [TTSY00,HD07,THH + 05, BS05,DGC05] (featuring super lightweight threads, work stealing techniques, etc.) are not yet NUMA-aware.…”
Section: Introductionmentioning
confidence: 99%
“…In the case of OpenMP, this can be useful when using nested parallelism, assigning more threads to those groups with high load [9]. The case of MPI is much more complex because the number of processes is statically determined when starting the job (in case of malleable jobs), or when compiling the application (in case of rigid jobs).…”
Section: Related Workmentioning
confidence: 99%
“…Our thesis is that explicit on-chip communication mechanisms support efficiently a wide range of primitives that are common in runtime systems for parallel programming. The OpenMP primitives include: (i) scheduling of parallel loops and asynchronous tasks [5], using either work-stealing [6] or work-sharing [7]; (ii) user-level synchronization including locks, barriers, and reductions and (iii) data privatization in local memories [8].…”
Section: Introductionmentioning
confidence: 99%