Taylor Groves scite author profile

As data sets from DOE user science facilities grow in both size and complexity there is an urgent need for new capabilities to transfer, analyze and manage the data underlying scientific discoveries. LBNL's Superfacility project brings together experimental and observational research instruments with computational and network facilities at the National Energy Research Scientific Computing Center (NERSC) and the Energy Sciences Network (ESnet) with the goal of enabling user science.Here, we report on recent innovations in the Superfacility project, including advanced data management, API-based automation, real-time interactive user interfaces, and supported infrastructure for "edge" services.

show abstract

The Case of Performance Variability on Dragonfly-based Systems

Bhatelé

Thiagarajan

Groves

et al. 2020

View full text Add to dashboard Cite

Bandwidth steering in HPC using silicon nanophotonics

Shen

Teh

Meng

et al. 2019

View full text Add to dashboard Cite

Probability Delegation Forwarding in Delay Tolerant Networks

Xiao

Shen

Groves

et al. 2009

View full text Add to dashboard Cite

Abstract-Delay tolerant networks are a type of wireless mobile networks that do not guarantee the existence of a path between a source and a destination at any time. In such a network, one of the critical issues is to reliably deliver data with a low latency. Naive forwarding approaches, such as flooding and its derivatives, make the routing cost (here defined as the number of copies duplicated for a message) very high. Many efforts have been made to reduce the cost while maintaining performance. Recently, an approach called delegation forwarding (DF) caught significant attention in the research community because of its simplicity and good performance. In a network with N nodes, it reduces the cost to O( √ N ) which is better than O(N ) in other methods. In this paper, we extend the DF algorithm by putting forward a new scheme called probability delegation forwarding (PDF) that can further reduce the cost to O(N log 2+2p (1+p) ), p ∈ (0, 1). Simulation results show that PDF can achieve similar delivery ratio, which is the most important metric in DTNs, as the DF scheme at a lower cost if p is not too small. In addition, we propose the threshold probability delegation forwarding (TPDF) scheme to close the latency gap between the DF and PDF schemes.

show abstract

Understanding Performance Variability on the Aries Dragonfly Network

Groves

Wright

2017

View full text Add to dashboard Cite

Abstract-This work evaluates performance variability in the Cray Aries dragonfly network and characterizes its impact on MPI Allreduce. The execution time of Allreduce is limited by the performance of the slowest participating process, which can vary by more than an order of magnitude. We utilize counters from the network routers to provide a better understanding of how competing workloads can influence performance. Specifically, we examine the relationships between message size, process counts, Aries counters and the Allreduce communication-time. Our results suggest that competing traffic from other jobs can significantly impact performance on the Aries Dragonfly Network. Furthermore, we show that Aries network counters are a valuable tool, explaining up to 70% of the performance variability for our experiments on a large-scale production system.

show abstract

NiMC: Characterizing and Eliminating Network-Induced Memory Contention

Groves

Grant

Arnold

2016

View full text Add to dashboard Cite

Power Aware Dynamic Provisioning of HPC Networks

Groves

Grant

2015

View full text Add to dashboard Cite

Future exascale systems are under increased pressure to find power savings. The network, while it consumes a considerable amount of power is often left out of the picture when discussing total system power. Even when network power is being considered, the references are frequently a decade or older and rely on models that lack validation on modern interconnects. In this work we explore how dynamic mechanisms of an Infiniband network save power and at what granularity we can engage these features. We explore this within the context of the host controller adapter (HCA) on the node and for the fabric, i.e. switches, using three different mechanisms of dynamic link width, frequency and disabling of links for QLogic and Mellanox systems. Our results show that while there is some potential for modest power savings, real world systems need to improved responsiveness to adjustments in order to fully leverage these savings.

show abstract

Improving MPI Multi-threaded RMA Communication Performance

Hjelm

Dosanjh

Grant

et al. 2018

View full text Add to dashboard Cite

One-sided communication is crucial to enabling communication concurrency. As core counts have increased, particularly with manycore architectures, one-sided (RMA) communication has been proposed to address the ever increasing contention at the network interface. The difficulty in using one-sided (RMA) communication with MPI is that the performance of MPI implementations using RMA with multiple concurrent threads is not well understood. Past studies have been done using MPI RMA in combination with multithreading (RMA-MT) but they have been performed on older MPI implementations lacking RMA-MT optimizations. In addition prior work has only been done at smaller scale (<=512 cores).In this paper, we describe a new RMA implementation for Open MPI. The implementation targets scalability and multi-threaded performance. We describe the design and implementation of our RMA improvements and offer an evaluation that demonstrates scaling to 524,288 cores, the full size of a leading supercomputer installation. In contrast, the previous implementation failed to scale past approximately 4,096 cores. To evaluate this approach, we then compare against a vendor optimized MPI RMA-MT implementation with microbenchmarks, a mini-application, and a full astrophysics code at large scale on a many-core architecture. This is the first time that an evaluation at large scale on many-core architectures has been done for MPI RMA-MT (524,288 cores) and the first large

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Taylor Groves

Cross-facility science with the Superfacility Project at LBNL

The Case of Performance Variability on Dragonfly-based Systems

Bandwidth steering in HPC using silicon nanophotonics

Probability Delegation Forwarding in Delay Tolerant Networks

Understanding Performance Variability on the Aries Dragonfly Network

NiMC: Characterizing and Eliminating Network-Induced Memory Contention

Power Aware Dynamic Provisioning of HPC Networks

Improving MPI Multi-threaded RMA Communication Performance

Contact Info

Product

Resources

About