A Software Based Approach for Providing Network Fault Tolerance in Clusters with uDAPL interface: MPI Level Design and Performance Evaluation

Vishnu,; Gupta,; Mamidala,; Panda,

doi:10.1109/sc.2006.5

Cited by 12 publications

(7 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are several published studies on multi-method MPIs, including [4,11,12,17,30,36]. Most of these assume static configurations of available communication methods.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Virtual machine aware communication libraries for high performance computing

Huang

Koop

Gao

et al. 2007

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing

103

View full text Add to dashboard Cite

As the size and complexity of modern computing systems keep increasing to meet the demanding requirements of High Performance Computing (HPC) applications, manageability is becoming a critical concern to achieve both high performance and high productivity computing. Meanwhile, virtual machine (VM) technologies have become popular in both industry and academia due to various features designed to ease system management and administration. While a VM-based environment can greatly help manageability on large-scale computing systems, concerns over performance have largely blocked the HPC community from embracing VM technologies.In this paper, we follow three steps to demonstrate the ability to achieve near-native performance in a VM-based environment for HPC. First, we propose Inter-VM Communication (IVC), a VM-aware communication library to support efficient shared memory communication among computing processes on the same physical host, even though they may be in different VMs. This is critical for multi-core systems, especially when individual computing processes are hosted on different VMs to achieve fine-grained control. Second, we design a VM-aware MPI library based on MVAPICH2 (a popular MPI library), called MVAPICH2-ivc, which allows HPC MPI applications to transparently benefit from IVC. Finally, we evaluate MVAPICH2-ivc on clusters featuring multi-core systems and high performance InfiniBand interconnects. Our evaluation demonstrates that MVAPICH2-ivc can improve NAS Parallel Benchmark performance by up to 11% in VM-based environment on eight-core Intel Clovertown systems, where each compute process is in a separate VM. A detailed performance evaluation for up to 128 processes (64 node dual-socket single-core systems) shows only a marginal performance overhead of MVAPICH2-ivc as compared with MVAPICH2 running in a native environment.

show abstract

“…There are several published studies on multi-method MPIs, including [4,11,12,17,30,36]. Most of these assume static configurations of available communication methods.…”

Section: Related Workmentioning

confidence: 99%

“…Most of these assume static configurations of available communication methods. Some of them support switching communication methods at runtime, but the main purpose is network fail-over [11,12,36]. MVAPICH2-ivc is designed for an environment where available communication methods may change due to migration.…”

Section: Related Workmentioning

confidence: 99%

Virtual machine aware communication libraries for high performance computing

Huang

Koop

Gao

et al. 2007

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing

103

View full text Add to dashboard Cite

show abstract

“…In our previous work, we have designed MPI-2 one sided communication using multi-rail InfiniBand networks [14]. Handling network heterogeneity and network faults with asynchronous recovery of previously failed paths has also been presented [13]. However, the above works have focused on design and evaluation with multi-rail networks on the end nodes (multiple ports, multiple adapters), rather than the network.…”

Section: Related Workmentioning

confidence: 99%

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

Vishnu

Koop

Moody

et al. 2007

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)

View full text Add to dashboard Cite

Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network depending upon the route configuration between end nodes and communication pattern(s) in the application. To make matters worse, the deterministic routing nature of In-finiBand limits the application from effective use of multiple paths transparently and avoid the hot-spots in the network. Simulation based studies for switches and adapters to implement congestion control have been proposed in the literature. However, these studies have focussed on providing congestion control for the communication path, and not on utilizing multiple paths in the network for hot-spot avoidance. In this paper, we design an MPI functionality, which provides hot-spot avoidance for different communications, without a priori knowledge of the pattern. We leverage LMC (LID Mask Count) mechanism of InfiniBand to create multiple paths in the network and present the design issues (scheduling policies, selecting number of paths, scalability aspects) of our design. We implement our design and evaluate it with Pallas collective communication and MPI applications. On an InfiniBand cluster with 48 processes, collective operations like MPI All-to-all Personalized and MPI Reduce Scatter show an improvement of 27% and 19% respectively. Our evaluation with MPI applications like NAS Parallel Benchmarks and PSTSWM on 64 processes shows significant improvement in execution time with this functionality.

show abstract

“…A network or machine failure can be detected by checking the completion queue entries. In [1,2], a similar method was used to detect network failure.…”

Section: Active Detection Of a Machine Crash Or Network Failurementioning

confidence: 99%

“…The design rationale for these studies has been that reliable remote memory connected with high speed interconnects are better than a single big machine in terms of cost-effectiveness. 1 1 The widespread architecture adopted by vendors showing top ten TPC-C results is a clustered architecture, not a big mainframe.…”

Section: Introductionmentioning

confidence: 99%

Performance evaluation of a remote memory system with commodity hardware for large-memory data processing

et al. 2011

View full text Add to dashboard Cite

The explosion of data and transactions demands a creative approach for data processing in a variety of applications. Research on remote memory systems (RMSs), so as to exploit the superior characteristics of dynamic random access memory (DRAM), has been performed for many decades, and today's information explosion galvanizes researchers into shedding new light on the technology. Prior studies have mainly focused on architectural suggestions for such systems, highlighting different design rationale. These studies have shown that choosing the appropriate applications to run on an RMS is important in fully utilizing the advantages of remote memory. This article provides an extensive performance evaluation for various types of data processing applications so as to address the efficacy of an RMS by means of a prototype RMS with reliability functionality. The prototype RMS used is a practical kernel-level RMS that renders large memory data processing feasible. The abstract concept of remote memory was materialized by borrowing unused local memory in commodity PCs via a high speed network capable of Remote Direct Memory Access (RDMA) operations. The prototype RMS uses remote memory without any part of its computation power coming from remote computers. Our experimental results suggest that an RMS can be practical in supporting the rigorous demands of commercial in memory database systems that have high data access locality. Our evaluation also convinces us of the possibility that a reliable RMS can satisfy both the high degree of reliability and efficiency for large memory data processing applications whose data access pattern has high locality.

show abstract

A Software Based Approach for Providing Network Fault Tolerance in Clusters with uDAPL interface: MPI Level Design and Performance Evaluation

Cited by 12 publications

References 3 publications

Virtual machine aware communication libraries for high performance computing

Virtual machine aware communication libraries for high performance computing

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

Performance evaluation of a remote memory system with commodity hardware for large-memory data processing

Contact Info

Product

Resources

About