2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016
DOI: 10.1109/ipdpsw.2016.139
|View full text |Cite
|
Sign up to set email alerts
|

Topology-Aware Rank Reordering for MPI Collectives

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…Hierarchical algorithms, as explored in [11]- [13], the multi-leader approach [14], and multi-lane communication methods [15], address bandwidth limitations inherent in electrical links. Topology-aware collective algorithms, such as HierKNEM [16] and Rank Reordering [17], aim to reduce link traversals in both intra-and inter-node communications. Approaches focusing on symmetric multiprocessing (SMP) and multi-core clusters [18]- [21], along with holistic optimization for various topologies [22]- [24], have also been investigated.…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical algorithms, as explored in [11]- [13], the multi-leader approach [14], and multi-lane communication methods [15], address bandwidth limitations inherent in electrical links. Topology-aware collective algorithms, such as HierKNEM [16] and Rank Reordering [17], aim to reduce link traversals in both intra-and inter-node communications. Approaches focusing on symmetric multiprocessing (SMP) and multi-core clusters [18]- [21], along with holistic optimization for various topologies [22]- [24], have also been investigated.…”
Section: Related Workmentioning
confidence: 99%
“…On every step s, with 0 ≤ s < log 2 p a process with rank r will exchange data in a pairwise fashion with a process that has rank r ⊕ 2 s , where ⊕ represents the binary exclusive or. As all process send all the data received so far, the number of blocks double at every step and the cost of the Recursive Doubling is given by C rd = (log 2 p)α + (p − 1) m p β [16]. The operation of this algorithm is limited only to numbers of processes that are powers of two and thus this is the only case where it is employed both on MPICH and Open MPI.…”
Section: A Allgather Algorithmsmentioning
confidence: 99%
“…The works of [16] and [6] have a more Allgather focused approach, using its known communication pattern to create mappings more suited for the algorithms. The first proposes fine-tuned heuristics for Ring, Recursive Doubling and Binomial broadcast (a possible final component of an Allgather or Broadcast execution), with the experimental results presenting improvements up to 78%.…”
Section: Related Workmentioning
confidence: 99%
“…There are approaches though, that do not carry this dependence. Authors in [24] for example, explore four heuristics, to perform rank reordering for realizing run-time topology awareness, for the case of the MPI Allgather primitive. The corresponding approach does not rely on an application's profile.…”
Section: Related Workmentioning
confidence: 99%