2003
DOI: 10.1177/1094342003017001005
|View full text |Cite
|
Sign up to set email alerts
|

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Abstract: Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node interconnect with the shared memory parallelization inside each node. The hybrid MPI+OpenMP programming model is compared with pure MPI, compiler based parallelization, and other parallel programming models on hybrid architectures. The paper focuses on bandwidth and latency aspects, and also on whether programming paradigms can separate the optimization of communication and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0
2

Year Published

2003
2003
2010
2010

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(28 citation statements)
references
References 14 publications
0
26
0
2
Order By: Relevance
“…Thus, we consider programs that use the common THREAD MASTERONLY model [13]. Its hierarchical decomposition closely matches most large-scale HPC systems, which are comprised of clustered nodes, each of which has multiple cores per node, distributed across multiple processors.…”
Section: Hybrid Mpi/openmp Terminologymentioning
confidence: 99%
“…Thus, we consider programs that use the common THREAD MASTERONLY model [13]. Its hierarchical decomposition closely matches most large-scale HPC systems, which are comprised of clustered nodes, each of which has multiple cores per node, distributed across multiple processors.…”
Section: Hybrid Mpi/openmp Terminologymentioning
confidence: 99%
“…The notion of overlapping communication and computation in various ways has been described before [10,11] but we present here a new way based on the new functionality of the OpenMP tasking model. OpenMP version 3.0 introduces the task directive, which allows the programmer to specify a unit of parallel work called an explicit task, which express unstructured parallelism and defines dynamically generated work units that will be processed by the team [1].…”
Section: The Gts Particle Shifter and How To Fightmentioning
confidence: 99%
“…Which approach -using OpenMP tasking or new MPI non-blocking collectives -performs best remains to be seen once the new MPI 3.0 version is available. Rabenseifner and Wellein [11] point out that the benefit is limited, mainly because the communication time can be hidden by parallelizing it to the numerical threads (which reduces the available threads for numerics by one). Therefore, without parallelizing communication with computation the maximum benefit ratio is (2 − 1/n) on n threads.…”
Section: ! a D D I N G S H I F T E D P A R T I C L E S From L E F T !mentioning
confidence: 99%
“…Nevertheless, a lot of important scientific work enlightens the complexity of the many aspects that affect the overall performance of hybrid programs ( [2], [8], [10]). Also, the need for a multi-threading MPI implementation that will efficiently support the hybrid model has been spotted by the research community ( [11], [9]).…”
Section: Introductionmentioning
confidence: 99%