2013 IEEE International Conference on Cluster Computing (CLUSTER) 2013
DOI: 10.1109/cluster.2013.6702676
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation

Abstract: Many scientific simulations, using the Message Passing Interface (MPI) programming model, are sensitive to the performance and scalability of reduction collective operations such as MPI Allreduce and MPI Reduce. These operations are the most widely used abstractions to perform mathematical operations over all processes that are part of the simulation. In this work, we propose a hierarchical design to implement the reduction operations on multicore systems. This design aims to improve the efficiency of reductio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 10 publications
(8 reference statements)
0
6
0
Order By: Relevance
“…However, several work has be done in optimizing this trees for MPI, eg. [15] [4], and this is not the scope of this work. However, is worth to mention that these structures can also be used for GGAS and GPU clusters.…”
Section: Work Sharing: Data Distribution Over Multiple Gpusmentioning
confidence: 93%
See 2 more Smart Citations
“…However, several work has be done in optimizing this trees for MPI, eg. [15] [4], and this is not the scope of this work. However, is worth to mention that these structures can also be used for GGAS and GPU clusters.…”
Section: Work Sharing: Data Distribution Over Multiple Gpusmentioning
confidence: 93%
“…To name a few, in [4], [15] and [3], blocking and nonblocking allreduce and reduce operations are optimized. Also, since GPUs are highly suitable to perform parallel reductions, the in-core reduction on a single GPU has been highly optimized, like described in [16].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Algorithmic work performed by Venkata et al [33] developed short vector blocking and non blocking reduction and barrier operations using a recursive K-ing type host-based approach, and extended work by Thakur [31]. Vadhiar et al [32] presented implementations of blocking reduction, gather and broadcast operations using sequential, chain, binary, binomial tree and Rabenseifner algorithms.…”
Section: Previous Workmentioning
confidence: 99%
“…According to research studies over the past two decades [2,3], MPI reduction operations, particularly MPI reduce and allreduce, are the most used collective operations in scientific applications. In the reduce operation, each node i owns a vector x i of n elements.…”
Section: Introductionmentioning
confidence: 99%