2021
DOI: 10.3389/fdata.2021.661501
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Declarative HEP Analysis Workflows for Containerised Compute Clouds

Abstract: We describe a novel approach for experimental High-Energy Physics (HEP) data analyses that is centred around the declarative rather than imperative paradigm when describing analysis computational tasks. The analysis process can be structured in the form of a Directed Acyclic Graph (DAG), where each graph vertex represents a unit of computation with its inputs and outputs, and the graph edges describe the interconnection of various computational steps. We have developed REANA, a platform for reproducible data a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…The REANA approach was successfully tested in several data production and data analysis scenarios. For example, REANA was used for ATLAS reinterpretation searches for new physics or CMS reconstruction and jet energy corrections [93]. In the CERN Open Data portal, REANA is used both on the "data production" side to ensure the correctness of preserved data sets' provenance information [90] as well as on the "data analysis" side by running several data analysis examples such as CMS open data Higgs-to-four-lepton example analysis (see Fig.…”
Section: Reana Reproducible Analysesmentioning
confidence: 99%
See 1 more Smart Citation
“…The REANA approach was successfully tested in several data production and data analysis scenarios. For example, REANA was used for ATLAS reinterpretation searches for new physics or CMS reconstruction and jet energy corrections [93]. In the CERN Open Data portal, REANA is used both on the "data production" side to ensure the correctness of preserved data sets' provenance information [90] as well as on the "data analysis" side by running several data analysis examples such as CMS open data Higgs-to-four-lepton example analysis (see Fig.…”
Section: Reana Reproducible Analysesmentioning
confidence: 99%
“…The integration with source code management platforms such as GitLab allows researchers to develop analyses on GitLab and run either full data analysis tasks on REANA or at least test the correctness of analysis workflow after each code change. If an analysis is developed in this "continuous integration" manner [93], the preservation of knowledge associated with the data analysis as well as the future deposit of analysis assets into digital repositories are largely facil-itated. REANA therefore complements the data preservation repositories by promoting active "preproducibility" of data analyses during the active analysis phase rather than only relying on passive data deposition and subsequent "reproducibility" once the analysis is completed [94].…”
Section: Reana Reproducible Analysesmentioning
confidence: 99%
“…This approach, however, integrates over a huge phase space and often yields suboptimal values for JSS [251]. With the help of declarative and therefore consistently repeatable workflows [254] and machine-learning techniques, this approach can be significantly improved. The generator settings can be adjusted iteratively by repeating dedicated JSS measurements, consequently yielding optimal settings for the given suite of measurements.…”
Section: Iterative Monte Carlo Generator Tuningmentioning
confidence: 99%
“…The support of loop workflows in iDDS makes it possible. Currently it's already integrated with PanDA and REANA [12]. It has successfully been tested with a mono-Hbb analysis.…”
Section: 31multiple-steps Task Chainmentioning
confidence: 99%