2010
DOI: 10.1007/978-3-642-15646-5_2
|View full text |Cite
|
Sign up to set email alerts
|

Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems

Abstract: Abstract.With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve highperformance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 24 publications
(19 citation statements)
references
References 5 publications
0
18
0
Order By: Relevance
“…We continued to work on the development of lightweight thread support for MPICH. Work early in the project was performed in collaboration with both Argonne and the IBM Blue Gene team [10]. Work on fine grain multithreading support showed how to avoid excessive lock overhead in an MPI implementation [3,2].…”
Section: Some Of the Most Interesting Results From This Project Addrementioning
confidence: 99%
“…We continued to work on the development of lightweight thread support for MPICH. Work early in the project was performed in collaboration with both Argonne and the IBM Blue Gene team [10]. Work on fine grain multithreading support showed how to avoid excessive lock overhead in an MPI implementation [3,2].…”
Section: Some Of the Most Interesting Results From This Project Addrementioning
confidence: 99%
“…This feature can be used to a mixed programming model, like the one explored by researchers in [22], where UPC and MPI were used to scale a memory bound application. In [16], the authors use parallel communication channels to speedup MPI message rate. PAMI extends and generalizes this notion of communication parallelism using PAMI Contexts and uses a new message handoff technique to accelerate message rate.…”
Section: A Related Workmentioning
confidence: 99%
“…Such an implementation is thread safe, but has limited scalability due to the global lock. We explored fine grained locking and lockless techniques in MPICH2 [13,16]. We extended request allocators by creating thread private pools to minimize locking overheads.…”
Section: A Multi Threaded Mpi Over Pamimentioning
confidence: 99%
“…While there is little related work on the endpoints as introduced in [10], a large body of work exists on the exploitation of shared memory nodes within MPI or other parallel programming languages like UPC [7], [9], [13]. Many of these papers focus on optimizing various communication primitives by means of a shared memory region.…”
Section: Related Workmentioning
confidence: 99%