2008
DOI: 10.1007/978-3-540-88140-7_19
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Transformation for Overlapping Communication and Computation

Abstract: Abstract. Message-passing is a predominant programming paradigm for distributed memory systems. RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. The extensions to control-flow graph can accurately analyze the message-passing program and help perform data-flow analysis effectively. This analysis ident… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…There has also been significant work undertaken looking at optimising communications in MPI programs. A number of authors have looked at compiler based optimisations to provide automatic overlapping of communications and computation in existing parallel programs [10,6,11]. These approaches have shown that performance improvements can be obtained, generally evaluated against kernel benchmarks such as the NAS parallel benchmarks, by transforming user specified blocking communication code to non-blocking communication functionality, and using static compiler analysis to determine where the communications can be started and finished.…”
Section: Related Workmentioning
confidence: 99%
“…There has also been significant work undertaken looking at optimising communications in MPI programs. A number of authors have looked at compiler based optimisations to provide automatic overlapping of communications and computation in existing parallel programs [10,6,11]. These approaches have shown that performance improvements can be obtained, generally evaluated against kernel benchmarks such as the NAS parallel benchmarks, by transforming user specified blocking communication code to non-blocking communication functionality, and using static compiler analysis to determine where the communications can be started and finished.…”
Section: Related Workmentioning
confidence: 99%
“…Other research has explored compiler-based MPI code optimizations [7], [8], or acceleration of MPI point-topoint communication [9]. Many approaches use code motion techniques to increase communication/computation overlap, an orthogonal technique that we could easily combine with our approach.…”
Section: Related Workmentioning
confidence: 99%