Proceedings of the 13th International Conference on Supercomputing 1999
DOI: 10.1145/305138.305206
|View full text |Cite
|
Sign up to set email alerts
|

Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

Abstract: This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping. Two work generation schemes and one join mechanism are designed, implemented, evaluated and compared with the ones used in the IFUX MP library, an efficient implementation which supports a single level of parallelism.Supporting multiple levels of parallelism is a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
29
0
1

Year Published

2000
2000
2006
2006

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 52 publications
(31 citation statements)
references
References 7 publications
1
29
0
1
Order By: Relevance
“…We attribute the performance degradation in the directive implementation of LU to less data locality and larger synchronization overhead in the 1-D pipeline used in the OpenMP version as compared to the 2-D pipeline used in the MPI version. This is consistent with the result of a study from [12].…”
Section: The Nas Parallel Benchmarkssupporting
confidence: 93%
See 1 more Smart Citation
“…We attribute the performance degradation in the directive implementation of LU to less data locality and larger synchronization overhead in the 1-D pipeline used in the OpenMP version as compared to the 2-D pipeline used in the MPI version. This is consistent with the result of a study from [12].…”
Section: The Nas Parallel Benchmarkssupporting
confidence: 93%
“…The pipeline algorithm is used for parallelizing the NAS benchmark LU in Sect. 4.1 and also described in [12].…”
Section: Pipeline Setupmentioning
confidence: 98%
“…NANOS compiler [12] based on Parafrase2 has been trying to exploit the multi-level parallelism including the coarse grain parallelism by using extended OpenMP API. The OSCAR multigrain parallelizing compiler [13] exploits the coarse grain task parallelism among loops, subroutines and basic blocks [14], and the near fine grain parallelism among statements inside a basic block [15] in addition to the conventional loop parallelism among iterations.…”
Section: Introductionmentioning
confidence: 99%
“…NANOS compiler [3] uses multi level parallelism by using the extended OpenMP API. PROMIS compiler [4] integrates loop level parallelism and instruction level parallelism using a common intermediate language.…”
Section: Introductionmentioning
confidence: 99%