2019
DOI: 10.1051/epjconf/201921406029
|View full text |Cite
|
Sign up to set email alerts
|

RDataFrame: Easy Parallel ROOT Analysis at 100 Threads

Abstract: The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly express parallelism. This document discusses the declarative analysis engine of ROOT, RDataFrame, and gives details about how it allows to profitably exploit commodit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
10

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(21 citation statements)
references
References 4 publications
0
21
0
Order By: Relevance
“…The functionality of the machine learning libraries is then fully accessible for analysis. ROOT implements this feature on top of the RDataFrame [15] infrastructure with the method AsNumpy, which allows the analyst to perform computational expensive preprocessing of the data in compiled C++ code and load only the required data to memory. See figure 1 for a code example which shows the loading of data from a ROOT file to memory and subsequently pushing the data to common Python based data analysis facilities such as Pandas [16].…”
Section: Interoperability With the Machine Learning Ecosystemmentioning
confidence: 99%
“…The functionality of the machine learning libraries is then fully accessible for analysis. ROOT implements this feature on top of the RDataFrame [15] infrastructure with the method AsNumpy, which allows the analyst to perform computational expensive preprocessing of the data in compiled C++ code and load only the required data to memory. See figure 1 for a code example which shows the loading of data from a ROOT file to memory and subsequently pushing the data to common Python based data analysis facilities such as Pandas [16].…”
Section: Interoperability With the Machine Learning Ecosystemmentioning
confidence: 99%
“…The focused DSL developments for analyses are relatively new, but a DSL has been long embedded within the ROOT framework ( Brun and Rademakers, 1997 ) under the guise of TTreeFormula, TTree::Draw and TTree::Scan, which allow visual or textual representation of TTree contents for simple and quick exploratory analysis This DSL is however limited only to simple arithmetic operations, mathematical functions and basic selection criteria. Recently, ROOT developers introduced RDataFrame, a tool to process and analyze columnar datasets as a modern alternative for data analysis ( Piparo et al, 2019 ). Although RDataFrame is not a DSL itself, it implements declarative analysis by using keywords for transformations (e.g., filtering data, defining new variables) and actions (e.g., creating histograms), and is interfaced to the ROOT classes TTreeReader and TTreeDraw.…”
Section: Introduction: Domain Specific Languages For High Energy Physics Analysismentioning
confidence: 99%
“…Column-wise data analysis [7], in which a single operation on a vector of events replaces calculations on individual events serially, is seen as a way to for the field to take advantage of vector processing units in modern CPUs, leading to significant speed-ups in throughput. Also, declarative programming paradigms [8] can make it simpler for physicists to intuitively code their analysis.…”
Section: Introductionmentioning
confidence: 99%