Amith R. Mamidala scite author profile

Amith R. Mamidala

5Publications

178Citation Statements Received

31Citation Statements Given

How they've been cited

270

176

How they cite others

Affiliations

IBM Research - Thomas J. Watson Research Center, The Ohio State University, IBM (United States)

Publications

Order By: Most citations

PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer

Kumar

Mamidala

Faraj

et al. 2012

View full text Add to dashboard Cite

The Blue Gene/Q machine is the next generation in the line of IBM massively parallel supercomputers, designed to scale to 262144 nodes and sixteen million threads. With each BG/Q node having 68 hardware threads, hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, are ideal and will enable applications to achieve high throughput on BG/Q. With such unprecedented massive parallelism and scale, this paper is a groundbreaking effort to explore the design challenges for designing a communication library that can match and exploit such massive parallelism In particular, we present the Parallel Active Messaging Interface (PAMI) library as our BG/Q library solution to the many challenges that come with a machine at such scale. PAMI provides (1) novel techniques to partition the application communication overhead into many contexts that can be accelerated by communication threads; (2) client and context objects to support multiple and different programming paradigms; (3) lockless algorithms to speed up MPI message rate; and (4) novel techniques leveraging the new BG/Q architectural features such as the scalable atomic primitives implemented in the L2 cache, the highly parallel hardware messaging unit that supports both point-to-point and collective operations, and the collective hardware acceleration for operations such as broadcast, reduce, and allreduce. We experimented with PAMI on 2048 BG/Q nodes and the results show high messaging rates as well as low latencies and high throughputs for collective communication operations.

show abstract

Looking under the hood of the IBM Blue Gene/Q network

Chen

Eisley

Heidelberger

et al. 2012

View full text Add to dashboard Cite

MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations

Faraj

Kumar

Smith

et al. 2009

View full text Add to dashboard Cite

MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics

Mamidala

Kumar

et al. 2008

View full text Add to dashboard Cite

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

Vishnu

Koop

Moody

et al. 2007

View full text Add to dashboard Cite

Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network depending upon the route configuration between end nodes and communication pattern(s) in the application. To make matters worse, the deterministic routing nature of In-finiBand limits the application from effective use of multiple paths transparently and avoid the hot-spots in the network. Simulation based studies for switches and adapters to implement congestion control have been proposed in the literature. However, these studies have focussed on providing congestion control for the communication path, and not on utilizing multiple paths in the network for hot-spot avoidance. In this paper, we design an MPI functionality, which provides hot-spot avoidance for different communications, without a priori knowledge of the pattern. We leverage LMC (LID Mask Count) mechanism of InfiniBand to create multiple paths in the network and present the design issues (scheduling policies, selecting number of paths, scalability aspects) of our design. We implement our design and evaluate it with Pallas collective communication and MPI applications. On an InfiniBand cluster with 48 processes, collective operations like MPI All-to-all Personalized and MPI Reduce Scatter show an improvement of 27% and 19% respectively. Our evaluation with MPI applications like NAS Parallel Benchmarks and PSTSWM on 64 processes shows significant improvement in execution time with this functionality.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Amith R. Mamidala

PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer

Looking under the hood of the IBM Blue Gene/Q network

MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations

MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

Contact Info

Product

Resources

About