Optimizing Metacomputing with Communication-Computation Overlap

Baude, Françoise; Caromel, Denis; Furmento, Nathalie; Sagnol, David

doi:10.1007/3-540-44743-1_19

Cited by 6 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several studies showed that the performance of parallel applications can be significantly enhanced with overlapping techniques (cf. [1,3,6,9,33]). …”

Section: Introductionmentioning

confidence: 99%

Implementation and performance analysis of non-blocking collective operations for MPI

Hoefler

Lumsdaine

Rehm

2007

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing

156

136

View full text Add to dashboard Cite

Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we present LibNBC, a portable high-performance library for implementing non-blocking collective MPI communication operations. LibNBC provides non-blocking versions of all MPI collective operations, is layered on top of MPI-1, and is portable to nearly all parallel architectures. To measure the performance characteristics of our implementation, we also present a microbenchmark for measuring both latency and overlap of computation and communication. Experimental results demonstrate that the blocking performance of the collective operations in our library is comparable to that of collective operations in other highperformance MPI implementations. Our library introduces a very low overhead between the application and the underlying MPI and thus, in conjunction with the potential to overlap communication with computation, offers the potential for optimizing real-world applications.

show abstract

“…Several studies showed that the performance of parallel applications can be significantly enhanced with overlapping techniques (cf. [1,3,6,9,33]). …”

Section: Introductionmentioning

confidence: 99%

Implementation and performance analysis of non-blocking collective operations for MPI

Hoefler

Lumsdaine

Rehm

2007

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing

156

136

View full text Add to dashboard Cite

show abstract

“…Brightwell et al [7] classifies the source of performance advantage for overlap and Dimitrov [8] uses overlapping as fundamental approach to optimize parallel applications for cluster systems. Other studies, as [9,10,11,12] apply several transformations to parallel codes to enable overlapping. However, little research has been done in the field of non-blocking collectives.…”

Section: Related Workmentioning

confidence: 99%

A Case for Non-blocking Collective Operations

Hoefler

Squyres

Rehm

et al. 2006

Frontiers of High Performance Computing and Networking – ISPA 2006 Workshops

View full text Add to dashboard Cite

Abstract. Non-blocking collective operations for MPI have been in discussion for a long time. We want to contribute to this discussion and to give a rationale for the usage these operations and assess their possible benefits. A LogGP model for the CPU overhead of collective algorithms and a benchmark to measures it are provided and show a large potential to overlap communication and computation. We show that nonblocking collective operations can provide at least the same benefits as non-blocking point to point operations already do. Our claim is that actual CPU overhead for non-blocking collective operations depends on the message size and the communicator size and benefits especially highly scalable applications with huge communicators. We prove that the share of the overhead of the overall communication time of current blocking collective operations gets smaller with bigger communicators and larger messages. We show that the user level CPU overhead is less than 10% for MPICH2 and LAM/MPI using TCP/IP communication, which leads us to the conclusion that, by using non-blocking collective communication, ideally 90% idle CPU time can be freed for the application.

show abstract

“…This section presents the concepts and an implementation of an overlapping mechanism between communication and computation [7]. This mechanism allows to decrease the execution time of a remote method invocation, especially in the context of important transfers, such as matrices.…”

Section: Overlapping Communication With Computationmentioning

confidence: 99%

Distributed Objects for Parallel Numerical Applications

Baude¹,

Caromel²,

Sagnol³

2002

ESAIM: M2AN

Self Cite

View full text Add to dashboard Cite

Abstract. The C++// language (pronounced C++ parallel) was designed and implemented with the aim of importing reusability into parallel and concurrent programming, in the framework of a mimd model. From a reduced set of rather simple primitives, comprehensive and versatile libraries are defined. In the absence of any syntactical extension, the C++// user writes standard C++ code. The libraries are themselves extensible by the final users, making C++// an open system. Two specific techniques to improve performances of a distributed object language such as C++// are then presented: Sharedon-Read and Overlapping of Communication and Computation. The appliance of those techniques is guided by the programmer at a very high-level of abstraction, so the additional work to yield those good performance improvements is kept to the minimum.Mathematics Subject Classification. 68N15, 68N19.

show abstract

Optimizing Metacomputing with Communication-Computation Overlap

Cited by 6 publications

References 15 publications

Implementation and performance analysis of non-blocking collective operations for MPI

Implementation and performance analysis of non-blocking collective operations for MPI

A Case for Non-blocking Collective Operations

Distributed Objects for Parallel Numerical Applications

Contact Info

Product

Resources

About