Nicholas P. Cardo scite author profile

Nicholas P. Cardo

4Publications

13Citation Statements Received

18Citation Statements Given

How they've been cited

How they cite others

Affiliations

Swisscom (Switzerland), ETH Zurich, National Energy Research Scientific Computing Center

Publications

Order By: Most citations

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren

Andersson

Brandt

et al. 2018

View full text Add to dashboard Cite

Monitoring of High Performance Computing (HPC) platforms is critical to successful operations, can provide insights into performance-impacting conditions, and can inform methodologies for improving science throughput. However, monitoring systems are not generally considered core capabilities in system requirements specifications nor in vendor development strategies. In this paper we present work performed at a number of large-scale HPC sites towards developing monitoring capabilities that fill current gaps in ease of problem identification and root cause discovery. We also present our collective views, based on the experiences presented, on needs and requirements for enabling development by vendors or users of effective sharable end-to-end monitoring capabilities.

show abstract

Dynamic Distribution of High-Rate Data Processing from CERN to Remote HPC Data Centers

Cameron

Cardo

Conciatore

et al. 2021

Comput Softw Big Sci

View full text Add to dashboard Cite

The prompt reconstruction of the data recorded from the Large Hadron Collider (LHC) detectors has always been addressed by dedicated resources at the CERN Tier-0. Such workloads come in spikes due to the nature of the operation of the accelerator and in special high load occasions experiments have commissioned methods to distribute (spill-over) a fraction of the load to sites outside CERN. The present work demonstrates a new way of supporting the Tier-0 environment by provisioning resources elastically for such spilled-over workflows onto the Piz Daint Supercomputer at CSCS. This is implemented using containers, tuning the existing batch scheduler and reinforcing the scratch file system, while still using standard Grid middleware. ATLAS, CMS and CSCS have jointly run selected prompt data reconstruction on up to several thousand cores on Piz Daint into a shared environment, thereby probing the viability of the CSCS high performance computer site as on demand extension of the CERN Tier-0, which could play a role in addressing the future LHC computing challenges for the high luminosity LHC.

show abstract

Holistic Evaluation of Lightweight Operating Systems using the PERCU Method

Kramer¹,

He²,

Carter³

et al. 2008

View full text Add to dashboard Cite

The scale of Leadership Class Systems presents unique challenges to the features and performance of operating system services. This paper reports results of comprehensive evaluations of two Light Weight Operating Systems (LWOS), Cray's Catamount Virtual Node (CVN) and Linux Environment (CLE) operating systems, on the exact same large-scale hardware. The evaluation was carried out over a 5-month period on NERSC's 19,480 core Cray XT-4, Franklin, using a comprehensive evaluation method that spans Performance, Effectiveness, Reliability, Consistency and Usability criteria for all major subsystems and features. The paper presents the results of the comparison between CVN and CLE, evaluates their relative strengths, and reports observations regarding the world's largest Cray XT-4 as well.

show abstract

An Analysis of Node Asymmetries on seaborg.nersc.gov

Skinner¹,

Cardo²

2003

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nicholas P. Cardo

Large-Scale System Monitoring Experiences and Recommendations

Dynamic Distribution of High-Rate Data Processing from CERN to Remote HPC Data Centers

Holistic Evaluation of Lightweight Operating Systems using the PERCU Method

An Analysis of Node Asymmetries on seaborg.nersc.gov

Contact Info

Product

Resources

About