2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) 2017
DOI: 10.1109/ccgrid.2017.32
|View full text |Cite
|
Sign up to set email alerts
|

Formal Modeling and Performance Evaluation of a Run-Time Rank Remapping Technique in Broadcast, Allgather and Allreduce MPI Collective Operations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 14 publications
0
7
0
Order By: Relevance
“…In addition, this collective has a hierarchical design adapted to the target platform. The reduction is solved following a specific rank ordering in this hierarchical design [63] generating dependencies between intermediate calculations. This fact combined with several tasks sharing one CPU (oversubscription) may degrade performance proportional to the size of the application.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…In addition, this collective has a hierarchical design adapted to the target platform. The reduction is solved following a specific rank ordering in this hierarchical design [63] generating dependencies between intermediate calculations. This fact combined with several tasks sharing one CPU (oversubscription) may degrade performance proportional to the size of the application.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…On every step, a process with rank r will send a block to the process with rank r+1 and receive another from the process with rank r−1 (wrapping around if a destination or source is out of bounds). Each process's own block is sent on the first step, while on all others the block received on the previous step is forwarded [7]. Hereafter, the number of processes involved in the algorithm is represented by p, while m represents the total amount of data that a process must have at the end of the operation.…”
Section: A Allgather Algorithmsmentioning
confidence: 99%
“…The formal definitions of the algorithms assume equally balanced communication costs to all peers, but computing clusters and supercomputers often employ hierarchical network topologies [6]. On these networks the cost for performing communication between two nodes is highly dependent on the physical location of each peer [7], and the further away they are, the longer are the physical paths between them and therefore the higher the latency. From a bandwidth perspective, the further away two nodes are the higher is the chance that their communication will cross the core of the network, whose bandwidth is more expensive and supports less saturation than the edge [17], possibly leading to slowdowns or contentions.…”
Section: B Problem Formulationmentioning
confidence: 99%
See 2 more Smart Citations