2021
DOI: 10.1051/epjconf/202125103001
|View full text |Cite
|
Sign up to set email alerts
|

Columnar data analysis with ATLAS analysis formats

Abstract: Future analysis of ATLAS data will involve new small-sized analysis formats to cope with the increased storage needs. The smallest of these, named DAOD_PHYSLITE, has calibrations already applied to allow fast downstream analysis and avoid the need for further analysis-specific intermediate formats. This allows for application of the “columnar analysis” paradigm where operations are applied on a per-array instead of a per-event basis. We will present methods to read the data into memory, using Uproot, and also … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…• Following the Pandas DataFrame model (originally from R), columns-named, typed attributes of all entries in a dataset-are becoming more visible in analysis at the expense of rows, which are instances of identically typed data, usually collision events in HEP. The word "columnar analysis" is frequently applied to emphasize column-granularity, whether it is for internal data engineering (transferring less data or vectorizing a calculation) or it is highly visible to users as array-at-a-time operations [14]. Array-at-a-time interfaces bridge the gap between interactive tinkering (in the style of Pandas or TTree::Draw) and production-ready analysis scripts.…”
Section: The Hep Analysis Software Landscape Is Changingmentioning
confidence: 99%
“…• Following the Pandas DataFrame model (originally from R), columns-named, typed attributes of all entries in a dataset-are becoming more visible in analysis at the expense of rows, which are instances of identically typed data, usually collision events in HEP. The word "columnar analysis" is frequently applied to emphasize column-granularity, whether it is for internal data engineering (transferring less data or vectorizing a calculation) or it is highly visible to users as array-at-a-time operations [14]. Array-at-a-time interfaces bridge the gap between interactive tinkering (in the style of Pandas or TTree::Draw) and production-ready analysis scripts.…”
Section: The Hep Analysis Software Landscape Is Changingmentioning
confidence: 99%
“…Timelines and data amounts will be defined in accordance with this policy. For Level-3 data, in particular, real collisions and simulated datasets will be released after casting them into the new PHYSLITE [14] format, which is a simplified scheme that contains calibrated objects and also information to compute systematic uncertainties. In addition, special datasets may be approved for release.…”
Section: Atlas Open Datamentioning
confidence: 99%