2009 17th IEEE Symposium on High Performance Interconnects 2009
DOI: 10.1109/hoti.2009.12
|View full text |Cite
|
Sign up to set email alerts
|

MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 36 publications
(33 citation statements)
references
References 7 publications
0
33
0
Order By: Relevance
“…We do this by comparing theoretical bounds based on first principles gathered from the documentation [7] with benchmarked performance on MPI COMM WORLD on full allocations. Models for collective communications on other communicators and non-cubic allocations could be derived with a similar method.…”
Section: Process-to-node Mappingmentioning
confidence: 99%
See 2 more Smart Citations
“…We do this by comparing theoretical bounds based on first principles gathered from the documentation [7] with benchmarked performance on MPI COMM WORLD on full allocations. Models for collective communications on other communicators and non-cubic allocations could be derived with a similar method.…”
Section: Process-to-node Mappingmentioning
confidence: 99%
“…This is the most accurate technique in our hierarchy and most convenient for the application developer. For example, the cost of a barrier on a BlueGene/P (BG/P) is T BAR = 0.95µs [7], independent of P .…”
Section: Previous Work and A General Approach To Modelingmentioning
confidence: 99%
See 1 more Smart Citation
“…So, on a torus network, a message must travel l · d/2 hops at each step, rather than 1 hop as in SD-Cannon . The Blue Gene/P machine provides efficient broadcast collectives that work at a fine granularity and incur little latency overhead [6]. However, on a machine without this type of topology-aware collectives, SD-Cannon would have a strong advantage, as messages would need to travel fewer hops.…”
Section: Discussionmentioning
confidence: 99%
“…Each update requires a broadcast along a row or column of processors. If a higher-dimensional torus is flattened into each row and column of the mapping, rectangular collective algorithms [13,6,10] can utilize all dimensions of the network. Rectangular algorithms subdivide and pipeline the messages into edge-disjoint spanning trees formed by traversing the network in different dimensional orders.…”
Section: Introductionmentioning
confidence: 99%