Jefferson Porter scite author profile

Jefferson Porter

4Publications

4Citation Statements Received

6Citation Statements Given

How they've been cited

How they cite others

Affiliations

Lawrence Berkeley National Laboratory

Publications

Order By: Most citations

STAR Data Reconstruction at NERSC/Cori, an adaptable Docker container approach for HPC

Mustafa

Balewski

Porter

et al. 2017

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

As HPC facilities grow their resources, adaptation of classic HEP/NP workflows becomes a need. Linux containers may very well offer a way to lower the bar to exploiting such resources and at the time, help collaboration to reach vast elastic resources on such facilities and address their massive current and future data processing challenges.In this proceeding, we showcase STAR data reconstruction workflow at Cori HPC system at NERSC. STAR software is packaged in a Docker image and runs at Cori in Shifter containers. We highlight two of the typical end-to-end optimization challenges for such pipelines: 1) data transfer rate which was carried over ESnet after optimizing end points and 2) scalable deployment of conditions database in an HPC environment. Our tests demonstrate equally efficient data processing workflows on Cori/HPC, comparable to standard Linux clusters.1 http://www.nersc.gov/users/computational-systems/cori/ arXiv:1702.06593v1 [physics.data-an]

show abstract

Physics Data Production on HPC: Experience to be efficiently running at scale

2020

View full text Add to dashboard Cite

The Solenoidal Tracker at RHIC (STAR) is a multi-national supported experiment located at the Brookhaven National Lab and is currently the only remaining running experiment at RHIC. The raw physics data captured from the detector is on the order of tens of PBytes per data acquisition campaign, making STAR fit well within the definition of a big data science experiment. The production of the data has typically run using a High Throughput Computing (HTC) approach either done on a local farm or via Grid computing resources. Especially, all embedding simulations (complex workflow mixing real and simulated events) have been run on standard Linux resources at NERSC’s Parallel Distributed Systems Facility (PDSF). However, as per April 2019 PDSF has been retired and High Performance Computing (HPC) resources such as the Cray XC-40 Supercomputer known as “Cori” have become available for STAR’s data production as well as embedding. STAR has been the very first experiment to show feasibility of running a sustainable data production campaign on this computing resource. In this contribution, we hope to share with the community the best practices for using such resource efficiently. The use of Docker containers with Shifter is the standard approach to run on HPC at NERSC – this approach encapsulates the environment in which a standard STAR workflow runs. From the deployment of a tailored Scientific Linux environment (with the set of libraries and special configurations required for STAR to run) to the deployment of third-party software and the STAR specific software stack, we’ve learned it has become impractical to rely on a set of containers comprising each specific software release. To this extent, a solution based on the CernVM File System (CVMFS) for the deployment of software and services has been deployed but it doesn’t stop there. One needs to make careful scalability considerations when using a resource like Cori, such as avoiding metadata lookups, scalability of distributed filesystems, and real limitations of containerized environments on HPC. Additionally, CVMFS clients are not compatible on Cori nodes and one needs to rely on an indirect NFS mount scheme using custom services known as DVS servers designed to forward data to worker nodes. In our contribution, we will discuss our strategies from the past and our current solution based on CVMFS. The second focus of our presentation will be to discuss strategies to find the most efficient use of database Shifter containers serving our data production (a near “database as a service” approach) and the best methods to test and scale your workflow efficiently.

show abstract

STAR Data Production Workflow on HPC: Lessons Learned & Best Practices

Poat

Porter

Balewski

2020

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

The Solenoidal Tracker at RHIC (STAR) is a multi-national supported experiment located at Brookhaven National Lab. The raw physics data captured from the detector is on the order of tens of PBytes per data acquisition campaign, which makes STAR fit well within the definition of a big data science experiment. The production of the data has typically run on standard nodes or on standard Grid computing environments. All embedding simulations (complex workflow mixing real and simulated events) have been run on standard Linux resources at the National Energy Research Scientific Computing Center (NERSC) aka PDSF. However, HPC resources such as Cori have become available for STAR’s data production as well as embedding, and STAR has been the very first experiment to show feasibility of running a sustainable data production campaign on this computing resource. The use of Docker containers with Shifter is required to run on HPC @ NERSC – this approach encapsulates the environment in which a standard STAR workflow runs. From the deployment of a tailored Scientific Linux environment (requiring many of its own libraries and special configurations required to run) to the deployment of third-party software and the STAR specific software stack, it has become impractical to rely on a set of containers containing each specific software release. To this extent, solutions based on the CERN VM File System (CVMFS) for the deployment of software and services have been employed in HENP, but one needs to make careful scalability considerations when using a resource like Cori, such as not allowing all software to be deployed in containers or bare node. Additionally, CVMFS clients are not compatible on Cori nodes and one needs to rely on an indirect NFS/DVS mount scheme. In our contribution, we will discuss our strategies from the past and our current solution based on CVMFS. Furthermore, running on HPC is not a simple task as each aspect of the workflow must be enabled to scale, run efficiently, and the workflow needs to fit within the boundaries of the provided queue system (SLURM in this case). Lastly, we will also discuss what we have learned so far about what is the best method for grouping jobs to maximize a single 48 core HPC node within a specific time frame and maximize our workflow efficiency.

show abstract

Data transfer for STAR grid jobs

Chakaberia

Poat²,

Porter³

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

The Solenoidal Tracker at RHIC (STAR) is a multipurpose experiment at the Relativistic Heavy Ion Collider (RHIC) with the primary goal to study the formation and properties of the quark-gluon plasma. STAR is an international collaboration of member institutions and laboratories from around the world. Yearly data-taking period produces PBytes of raw data collected by the experiment. STAR primarily uses its dedicated facility at BNL to process this data, but has routinely leveraged distributed systems, both high throughput (HTC) and high performance (HPC) computing clusters, to significantly augment the processing capacity available to the experiment. The ability to automate the efficient transfer of large data sets on reliable, scalable, and secure infrastructure is critical for any large-scale distributed processing campaign. For more than a decade, STAR computing has relied upon GridFTP with its x509-based authentication to build such data transfer systems and integrate them into its larger production workflow. The end of support by the community for both GridFTP and the x509 standard requires STAR to investigate other approaches to meet its distributed processing needs. In this study we investigate two multi-purpose data distribution systems, Globus.org and XRootD, as alternatives to GridFTP. We compare both their performance and the ease by which each service is integrated into the type of secure and automated data transfer systems STAR has previously built using GridFTP. The presented approach and study may be applicable to other distributed data processing use cases beyond STAR.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.