2007
DOI: 10.1007/s10586-007-0012-0
|View full text |Cite
|
Sign up to set email alerts
|

Performance analysis of MPI collective operations

Abstract: Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing.In this paper, we analyze and attempt to improve intracluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, Lo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
119
0

Year Published

2007
2007
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 177 publications
(125 citation statements)
references
References 22 publications
0
119
0
Order By: Relevance
“…Both algorithms are used in practice depending on the proportion of P and the message size s (cf. [9]). Figure 7 shows the influence of pipelining on a measurement with P = 16 and n = 5.…”
Section: Simulation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Both algorithms are used in practice depending on the proportion of P and the message size s (cf. [9]). Figure 7 shows the influence of pipelining on a measurement with P = 16 and n = 5.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…Other models are general-purpose and attempt to be architectureindependent, for example the PRAM [4,5], BSP [6], or the LogP [7] models. Several studies have compared the accuracy of those models [8,9]. In general, it seems that the LogP model family bridges the complexity of real-world interconnection networks and the usability of an abstract network model fairly well.…”
Section: Introductionmentioning
confidence: 99%
“…In [8], this has been demonstrated for MPI's collective communication operations, even for hierarchical communi- cation networks with different sets of performance parameters. Pješivac-Grbović et al [10] have shown that PLogP provides flexible and accurate performance modeling.…”
Section: The Logp Modelmentioning
confidence: 99%
“…The CPU idle time during the communication will be modelled and benchmarked. Precise models for collective operations are presented in [20] and for barrier synchronization in [21]. Both studies show that the LogP [22] or LogGP [23] model is able to predict the communication time sufficiently accurately if the processes enter the collective operation simultaneously.…”
Section: Modelling Cpu and Network Activitymentioning
confidence: 99%
“…We assume the dissemination principle to perform MPI BARRIER (1), analyzed in [21]. Our model for MPI ALLREDUCE (2) assumes a simple binomial tree reduce implementation followed by MPI BCAST and our MPI BCAST (3) model assumes a binomial tree implementation (compare proposed models in [20]). …”
Section: Modelling Cpu and Network Activitymentioning
confidence: 99%