2022
DOI: 10.48550/arxiv.2205.04190
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs

Abstract: The performance of highly parallel applications on distributed-memory systems is influenced by many factors. Analytic performance modeling techniques aim to provide insight into performance limitations and are often the starting point of optimization efforts. However, coupling analytic models across the system hierarchy (socket, node, network) fails to encompass the intricate interplay between the program code and the hardware, especially when execution and communication bottlenecks are involved. In this paper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 26 publications
(48 reference statements)
0
2
0
Order By: Relevance
“…The histograms in Figure 3 sort the MPI time values of end-to-end (500 k iterations) runs of Bench [1][2][3][4]A into 35 bins. For memory-bound code, idle times are lower for desynchronized processes if the bandwidth saturation on a ccNUMA domain is weaker [2] (see Figure 3(c)).…”
Section: Rank/ccnuma-wise Timelines and Histogram Of Mpi Time And Per...mentioning
confidence: 99%
See 1 more Smart Citation
“…The histograms in Figure 3 sort the MPI time values of end-to-end (500 k iterations) runs of Bench [1][2][3][4]A into 35 bins. For memory-bound code, idle times are lower for desynchronized processes if the bandwidth saturation on a ccNUMA domain is weaker [2] (see Figure 3(c)).…”
Section: Rank/ccnuma-wise Timelines and Histogram Of Mpi Time And Per...mentioning
confidence: 99%
“…As a consequence, such programs settle in a metastable state, a computational wavefront, where neighboring processes are shifted in time with respect to each other (Figure 2 (right)). It was also shown [4] that this desynchronization can lead to substantial speedups via automatic overlap of communication and code execution.…”
Section: Introduction and Related Workmentioning
confidence: 99%