Message passing on the Meiko CS-2

Barton, E; Cownie, James H.; McLaren, Moray

doi:10.1016/0167-8191(94)90025-6

Cited by 23 publications

(9 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our approach argues for separation of data transfer and synchronization and for optimization of each of them using data-flow analysis techniques. Several machines, such as the Cray T3D [12], the Cray T3E [47], the Fujitsu AP1000+ [26], and the Meiko CS-2 [8] offer remote memory access primitives that allow efficient implementation of the Put/ Synch primitives. In addition, one-way communication is a key part of the proposed extensions to the Message Passing Interface standard [41].…”

Section: Discussionmentioning

confidence: 99%

Minimizing data and synchronization costs in one-way communication

Kandemir

Choudhary

Banerjee

et al. 2000

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

AbstractÐMinimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers. This paper presents a general technique which uses a global data-flow framework to optimize communication and synchronization in the context of the one-way communication model. In contrast to the conventional send/receive message-passing communication model, one-way communication is a new paradigm that decouples message transmission and synchronization. In parallel machines with appropriate low-level support, this may open up new opportunities not only to further optimize communication, but also to reduce the synchronization overhead. We present optimization techniques using our framework for eliminating redundant data communication and synchronization operations. Our approach works with the most general data alignments and distributions in languages like High Performance Fortran (HPF) and uses a combination of the traditional data-flow analysis and polyhedral algebra. Empirical results for several scientific benchmarks on a Cray T3E multiprocessor machine demonstrate that our approach is successful in reducing the number of data (communication) and synchronization messages, thereby reducing the overall execution times. memory machines, these techniques are readily applicable to uniform shared memory architectures as well. Gupta and Schonberg [19] show that compilers that generate code for one-way communication can exploit shared-memory architectures with flexible cache-coherence protocols (e.g., Wisconsin Typhoon [45] and Stanford FLASH [27]). To measure the benefits obtained from one-way communication (see Section 7), we used a Cray T3E [47] which is a logically shared, physically distributed memory multiprocessor that supports the PVM [15] and MPI [41] message-passing libraries, as well as a simple one-sided communication library provided by Silicon Graphics Inc.The Put primitiveÐexecuted by the producer of a dataÐtransfers the data from the producer's memory to the consumer's memory. This operation is very similar to the execution of a Send primitive by the producer and the execution of a matching Recv primitive by the consumer. There is an important difference, however: The consumer processor is not involved in the transfer directly and all the communication parameters are supplied by the producer [41]. As stated above, in order to ensure correctness, synchronization operations might be necessary. A large number of synchronization operations can be used to preserve the semantics of the program. These include barriers, point-to-point (or producer-consumer) synchronizations, and locks. The synchronization primitive used in this paper, namely SynchÐexecuted by the producer of a dataÐis a point-to-point communication primitive; however, our approach can be modified to work with other types of synchronizations as well. Note that both Stricker et al. [48] and Hayashi et al. [26] use barriers to implement synchronization. In contrast, our effort is aimed at reducing the tot...

show abstract

Section: Discussionmentioning

confidence: 99%

Minimizing data and synchronization costs in one-way communication

Kandemir

Choudhary

Banerjee

et al. 2000

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…Krishnamurthy et al [15] give an overview of the network interfaces foundon five hardware platforms, namely the CM-5 [24], the Cray T3D [7], the Meiko CS2 [3], the Intel Paragon [ 131, and the Berkeley NOW which is a cluster of UltraSPARCs connected by Myrinet. As mentioned earlier, we would like to refrain from discussing the details of any specific machine/vendor, since we are interested in a uniform comparison of different designs.…”

Section: Communication Mechanisms For Nowmentioning

confidence: 99%

pSNOW

Kasbekar

Nagar

Sivasubramaniam

1997

Proceedings of the 11th International Conference on Supercomputing - ICS '97

View full text Add to dashboard Cite

Performance evaluation plays a crucial role in the design of any system. Evaluation tools should clearly identify,isolate andquantify the bottlenecks in the execution to help restructure the application for better performance, as well as suggest enhancements to the existing design. While there has been significant progress recently in novel network interface designs and system software solutions to lower the communication overheads on emerging high performance Network of Workstations environments, performance evaluation tools for these environments have not kept pace with this progress.In this research, we present an execution-driven simulation tool called pSNOW that provides us a unified framework to model different system software and architectural designs, and evaluate these designs using real applications. Using this tool, we model three network interfaces and three communication software substrates, and evaluate their relative merits and demerits.

show abstract

“…Melko C5-2: The Meiko CS-2 [4] node contains a specialpurpose "Elan" network processor integrated with the network interface and DMA controller. The network processor is attached to the memory bus and is cache-coherent with the compute processor, which is a 40 MHz three-way superscalar SuperSparc processor.…”

Section: Thinking Machines Cm-5mentioning

confidence: 99%

Evaluation of architectural support for global address-based communication in large-scale parallel machines

et al. 1996

View full text Add to dashboard Cite

Large-scale parallel machines are incorporating increasingly sophisticated architectural support for user-level messaging and global memory access. We provide a systematic evaluation of a broad spectrum of current design alternatives based on our implementations of a global address language on the Thinking Machines CM-5, Intel Paragon, Meiko CS-2, Cray T3D, and Berkeley NOW. This evaluation includes a range of compilation strategies that make v arying use of the network processor each is optimized for the target architecture and the particular strategy. W e analyze a family of interacting issues that determine the performance tradeo s in each implementation, quantify the resulting latency, overhead, and bandwidth of the global access operations, and demonstrate the e ects on application performance. IntroductionIn recent y ears several architectures have demonstrated practical scalability b e y ond a thousand microprocessors, including the nCUBE/2, Thinking Machines CM-5, Intel Paragon, Meiko CS-2, and Cray T3D. More recently, researchers have also demonstrated high performance communication in Network of Workstations (NOW) using scalable switched local area network technology 28,6,12]. While the dominant programming model at this scale is message passing, the primitives used are inherently expensive, due to bu ering and scheduling overheads 29]. Consequently, these machines provide varying levels of architectural support for communication in a global address space via various forms of memory read and write.We developed the Split-C language to allow experimentation with new communication hardware mechanisms by involving the compiler in the support for the global address operations. Global memory operations are statically typed, so the Split-C compiler can generate a short sequence of code for each potentially remote operation as required by

show abstract

Message passing on the Meiko CS-2

Cited by 23 publications

References 1 publication

Minimizing data and synchronization costs in one-way communication

Minimizing data and synchronization costs in one-way communication

pSNOW

Evaluation of architectural support for global address-based communication in large-scale parallel machines

Contact Info

Product

Resources

About