Message‐passing performance of various computers

Dongarra, Jack; Dunigan, T.H.

doi:10.1002/(sici)1096-9128(199710)9:10<915::aid-cpe277>3.3.co;2-3

Cited by 12 publications

(12 citation statements)

References 3 publications

(3 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the SGI Power Challenge, t el = 10 s and bNBW = 64M B = s 19]. This is, strictly speaking, the value for uNBW but since communications take place via shared memory, the value of bNBW 66M B = s (dcopy() -which we calculated directly) suggesting that uNBW = bNBW for this architecture.…”

Section: Complete Exchangementioning

confidence: 79%

Communication patterns and models in prism

Evangelinos

Karniadakis

1996

Proceedings of the 1996 ACM/IEEE Conference on Supercomputing

View full text Add to dashboard Cite

In this paper we analyze communication patterns in the parallel three-dimensional Navier-Stokes solver Prism , and present performance results on the IBM SP2, the Cray T3D and the SGI P ow er Challenge XL. Prism is used for direct n umerical simulation of turbulence in non-separable and multiply-connected domains. The n umerical method used in the solver is based on mixed spectral element-F ourierexpansions in (x ; y) planes and z;direction, respectively. Each (or a group) of F ouriermodes is computed on a separate processor as the linear contributions (Helmholtz solves) are completely uncoupled in the incompressible Navier-Stokes equations coupling is obtained via the nonlinear contributions (convectiv e terms).The transfer of data betw eenphysical and Fourier space requires a series of complete exc hange operations, which dominate the communication cost for small numberof processors. As the number of processors increases, global reduction and gather operations become important while complete exc hangebecomes more latency dominated. Predictive models for these communication operations are proposed and tested against measurements. A relatively large variation in communication timings per iteration is observed in simulations and quanti ed in terms of speci c operations. A n umberof improvements are proposed that could signi cantly reduce the communications overheadwith increasing n umbersof processors, and generic predictive maps are developed for the complete exchange operation, which remains the fundamental communication in Prism . Results presented in this paper are representativ eof a wider class of parallel spectral and nite element codes for computational mechanics which require similar communication operations.Corresponding Author 1 0-89791-854-1/1996/$5.00 © 1996 IEEE Figure 2: Data Layout in Prism. Here we h a ve c hosen to store a Fourier mode (2 \Fourier planes") pernode. This also means that we keep 2 \Physical Planes" pernode. Because this is a Real-to-Complex FFT, N z = 2 m = 2P \Physical Planes" map to P + 1 independent Fourier modes (0 to N z =2 = P) as the other P ; 1 modes (N z =2 + 1 to N z ; 1 are xed by symmetry. Of these P + 1 modes, the rst and last one have vanishing imaginary parts, hence by \packing" the real part of mode P in place of the imaginary part of mode 0 we are left with N z = 2 P \Fourier planes" as well.The communication that the code is based on is the Global or Complete Exchange. This is a communication pattern of great importance to any 2-D or 3-D FFT-based solver as well, since it lies behind the transposition of a distributed matrix. It is also used in multiplying distributed matrices when one or more of the matrices is speci ed to be in transposed form. In the case of Prism it is used to move the data between Fourier and Physical space: For most of the calculation, the ow variables are in Fourier Space distributed in \Fourier planes" among the nodes, arranged according to the Fourier mode they correspond to. However when the need to form the non-linear products (Pass I) ...

show abstract

Section: Complete Exchangementioning

confidence: 79%

Communication patterns and models in prism

Evangelinos

Karniadakis

1996

Proceedings of the 1996 ACM/IEEE Conference on Supercomputing

View full text Add to dashboard Cite

show abstract

“…We did not focus on this problem here and thereby, in this paper we used a simple linear model which was used in [48] to analyze the performance of different collective operation algorithms, f p2p (m) = α + mβ, where α are startup code (or latency), independent of message size, β is the transfer time per byte and m is message size. The common technique to measure the parameters α and β is to use a ping-pong micro benchmark as the one given in [49].…”

Section: B Performance Prediction and Experimental Resultsmentioning

confidence: 99%

Reliable Multicast Protocol in Distributed Simulation for Multi-agent Systems

Hai

2009

2009 IEEE-RIVF International Conference on Computing and Communication Technologies

View full text Add to dashboard Cite

Using computational clusters to simulate multi-agent systems has attracted a considerable amount of interest in recent years because of its ability to perform investigations about large complex systems. Many factors such as the load balancing among the nodes, the communication latencies or the synchronizing the logical processes, however, can impact on its performance. This paper focuses on the minimization of communication costs by proposing using reliable multicast mechanism for reducing communication latency problem. This promises to give an improvement in its performance.

show abstract

“…If we compare with the performance numbers presented in Table 1, our performance measurements are a little under half the best FDDI performance, as measured in [6]. It represents the best performance we could achieve on the FDDI network through the PVM communication library.…”

Section: Experimental Fddi Throughput With Pvm Protocolsmentioning

confidence: 96%

“…the number of bytes which could be transmitted during latency time on various parallel architectures and network protocols ( Table 1). The numbers in the first two columns of Table 2 are all derived from Dongarra [6]. The third column is a measure of the network quality (throughput divided by latency).…”

Section: Theoretical Cap Token Overheadmentioning

confidence: 99%

Computer-assisted generation of PVM/C++ programs using CAP

Gennart

Giménez

Hersch

1996

Parallel Virtual Machine — EuroPVM '96

View full text Add to dashboard Cite

Parallelizing an algorithm consists of dividing the computation into a set of sequential operations, assigning the operations to threads, synchronizing the execution of threads, specifying the data transfer requirements between threads and mapping the threads onto processors. With current software technology, writing a parallel program executing the parallelized algorithm involves mixing sequential code with calls to a communication library such as PVM, both for communication and synchronization. This contribution introduces CAP (Computer-Aided Parallelization), a language extension to C++, from which C++/PVM programs are automatically generated. CAP allows to specify (1) the threads in a parallel program, (2) the messages exchanged between threads, and (3) the ordering of sequential operations required to complete a parallel task. All CAP operations (sequential and parallel) have a single input and a single output, and no shared variables. CAP separates completely the computation description from the communication and synchronization specification. From the CAP specification, a MPMD (multiple program multiple data) program is generated that executes on the various processing elements of the parallel machine. This contribution illustrates the features of the CAP parallel programming extension to C++. We demonstrate the expressive power of CAP and the performance of CAP-specified applications.

show abstract

Message‐passing performance of various computers

Cited by 12 publications

References 3 publications

Communication patterns and models in prism

Communication patterns and models in prism

Reliable Multicast Protocol in Distributed Simulation for Multi-agent Systems

Computer-assisted generation of PVM/C++ programs using CAP

Contact Info

Product

Resources

About