Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However, RDataFrame’s programming model is general enough to accommodate multiple implementations or backends: users could write their code once and execute it as-is locally or distributedly, just by selecting the corresponding backend. This abstract introduces PyRDF, a new python library developed on top of RDataFrame to seamlessly switch from local to distributed environments with no changes in the application code. In addition, PyRDF has been integrated with a service for web-based analysis, SWAN, where users can dynamically plug in new resources, as well as write, execute, monitor and debug distributed applications via an intuitive interface.
High-Energy Physics has evolved a rich set of software packages that need to work harmoniously to carry out the key software tasks needed by experiments. The problem of consistently building and deploying these packages as a coherent software stack is one that is shared across the HEP community. To that end the HEP Software Foundation Packaging Working Group has worked to identify common solutions that can be used across experiments, with an emphasis on consistent, reproducible builds and easy deployment into CernVM-FS or containers via CI systems. We based our approach on well-identified use cases and requirements from many experiments. In this paper we summarise the work of the group in the last year and how we have explored various approaches based on package managers from industry and the scientific computing community. We give details about a solution based on the Spack package manager which has been used to build the software required by the SuperNEMO and FCC experiments and trialled for a multi-experiment software stack, Key4hep. We shall discuss changes that needed to be made to Spack to satisfy all our requirements. We show how support for a build environment for software developers is provided.
Building, testing and deploying of coherent large software stacks is very challenging, in particular when they consist of the diverse set of packages required by the LHC experiments, the CERN Beams Department and data analysis services such as SWAN. These software stacks include several packages (Grid middleware, Monte Carlo generators, Machine Learning tools, Python modules) all available for a large number of compilers, operating systems and hardware architectures. To address this challenge, we developed an infrastructure around a tool called lcgcmake. Dedicated modules are responsible for building the packages, controlling the dependencies in a reliable and scalable way. The distribution relies on a robust and automatic system, responsible for building and testing the packages, installing them on CernVM-FS and packaging the binaries in RPMs and tarballs. This system is orchestrated through Jenkins on build machines provided by the CERN Openstack facility. The results are published through user-friendly web pages. In this paper we will present an overview of these infrastructure tools and policies. We also discuss the role of this effort within the HEP Software Foundation (HSF). Finally we will discuss the evolution of the infrastructure towards container (Docker) technologies and the future directions and challenges of the project. *
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.