ACM/IEEE SC 2000 Conference (SC'00) 2000
DOI: 10.1109/sc.2000.10024
|View full text |Cite
|
Sign up to set email alerts
|

Automatically Tuned Collective Communications

Abstract: The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper, we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dyna… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
83
0

Year Published

2005
2005
2018
2018

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 98 publications
(83 citation statements)
references
References 7 publications
0
83
0
Order By: Relevance
“…Similar to other architecture specific collective communication algorithms [8,10,17], the techniques developed in this paper can be used in advanced communication libraries [7,9,13,30]. Our research extends the work in [11,23,28] by considering multiple switches. As shown in the performance study, pipelined broadcast using topology unaware trees in such an environment may yield extremely poor performance.…”
Section: Related Workmentioning
confidence: 76%
See 1 more Smart Citation
“…Similar to other architecture specific collective communication algorithms [8,10,17], the techniques developed in this paper can be used in advanced communication libraries [7,9,13,30]. Our research extends the work in [11,23,28] by considering multiple switches. As shown in the performance study, pipelined broadcast using topology unaware trees in such an environment may yield extremely poor performance.…”
Section: Related Workmentioning
confidence: 76%
“…In [29], a pipelined broadcast technique is proposed for the mesh topology. The effectiveness of pipelined broadcast in cluster environments was demonstrated in [11,23,28]. It was shown that pipelined broadcast using topology unaware trees can be very efficient for clusters connected by a single switch.…”
Section: Related Workmentioning
confidence: 99%
“…However, these works consider mostly homogeneous environments, and does not take the diversity of processor speed into considerationmostly because these approaches rely on recursive processor halving techniques and low level communication protocol improvements, and they may not perform well in heterogeneous clusters. Vadhiyar et al [33] gave an experimental approach to tune the collective communication, including reduction, via exhaustive search on heterogeneous clusters.…”
Section: Introductionmentioning
confidence: 99%
“…We want to emphasize that the exhaustive search techniques in this paper are guided by the theoretical foundation, and these techniques can be incorporated into the framework suggested by Vadhiyar et. al [33] that actually measures the key communication parameters, to further optimize the search efficiency.…”
Section: Introductionmentioning
confidence: 99%
“…The current implementation is optimized for InfiniBand TM and implements different algorithms for most collective operations (cf. [15]). …”
Section: Implementation With Non-blocking Point-to-pointmentioning
confidence: 99%