This paper evaluates remote memory access (RMA) communication capabilities and performance on the Cray XT3. We discuss properties of the network hardware and Portals networking software layer and corresponding implementation issues for SHMEM and ARMCI portable RMA interfaces. The performance of these interfaces is studied and compared to MPI performance.
The Cray Gemini Interconnect has been recently introduced as the next generation network for building scalable multi-petascale supercomputers. The Cray XE6 systems, which use the Gemini Interconnect are becoming available with Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) Models such as as Global Arrays, Unified Parallel C, Co-Array Fortran and Cascade High Performance Language. These PGAS models use one-sided communication runtime systems such as MPI-Remote Memory Access, Aggregate Remote Memory Copy Interface and proprietary communication runtime systems. The primary objective of our work is to study the potential of Cray Gemini Interconnect by designing application specific micro-benchmarks using the DMAPP userspace library. We design micro-benchmarks to study the performance of simple communication primitives and application specific microbenchmarks to understand the behavior of Gemini Interconnect at scale. In our experiments, the Gemini Interconnect can achieve a peak bandwidth of 6911 MB/s and a latency of 1µs for get communication primitive. Scalability tests for atomic memory operations and shift communication operation up to 65536 processes show the efficacy of the Gemini Interconnect.
No abstract
Power system transient stability analysis computes the response of the rapidly changing electrical components of a power system to a sequence of large disturbances followed by operations to protect the system against the disturbances. Transient stability analysis involves repeatedly solving large, very sparse, time varying non-linear systems over thousands of time steps. In this paper, we present parallel implemen tations of the transient stability problem in which w e use direct methods to solve the linearized systems. One method uses factorization and forw ard and backward substitution to solve the linear systems. Another method, known as the W-Matrix method, uses factorization and partitioning to increase the amount of parallelism during the solution phase. The third method, the Repeated Substitution method, uses factorization and computations whic h can be done ahead of time to further increase the amoun t of parallelism during the solution phase. W e discuss the performance of the dierent methods implemen ted on a loosely coupled, heterogeneous network of workstations (NOW) and the SP2 cluster of workstations.Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. SC '95, San Diego, CA
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.