2016
DOI: 10.1007/s11227-016-1779-7
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical redesign of classic MPI reduction algorithms

Abstract: Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecturespecific collective algorithms have been proposed and implemented in the state-of-theart MPI implementations. Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications. This approach has been suc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(12 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…The MPI instances usually optimize the CCPs via the implementation of a variety of algorithms (or communication schemes), in principle selecting the most appropriate option at execution time depending, for example, on the message size, number of processes, network topology, etc. For the particular case of (the non-blocking) MPI_Iallreduce, the following list briefly describes some of the most popular algorithms (see [9,12,30] for additional details):…”
Section: A Family Of Algorithmsmentioning
confidence: 99%
“…The MPI instances usually optimize the CCPs via the implementation of a variety of algorithms (or communication schemes), in principle selecting the most appropriate option at execution time depending, for example, on the message size, number of processes, network topology, etc. For the particular case of (the non-blocking) MPI_Iallreduce, the following list briefly describes some of the most popular algorithms (see [9,12,30] for additional details):…”
Section: A Family Of Algorithmsmentioning
confidence: 99%
“…Existing analytical modelling approaches [5], [23], [40] estimate the execution time of the binary and binomial tree broadcast algorithms as follows:…”
Section: Comparison Of the Proposed Analytical Models Against The State Of The Artmentioning
confidence: 99%
“…Recently, apart of the imbalanced PAPs subject [9], the focus of MPI reduce algorithms were mainly scoped on hierarchical optimization, where the processes are distributed over a compute cluster, and more than one process is assigned to the same node. In [7], Hasanov et al proposed an algorithm categorizing the processes into the local (in the same node) and global (distributed over a compute cluster) communicators, where the reduce can be performed in two phases, increasing the performance of the operation. In [21], Shan et al proposed utilization of idle threads on manycore processors and data compression to boost MPI reduce performance, and in [28], Zhao et al showed that usage of k-nomial tree algorithms has the advantage over a typical binomial solution.…”
Section: Related Workmentioning
confidence: 99%