2020
DOI: 10.1051/epjconf/202024503009
|View full text |Cite
|
Sign up to set email alerts
|

Distributed data analysis with ROOT RDataFrame

Abstract: Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However, RDataFrame’s programming model is general enough to accommodate multiple implementations or backends: users could write their … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 7 publications
(5 reference statements)
1
9
0
Order By: Relevance
“…During the coming years, ROOT will continue to work on providing efficient and easy to use interfaces. Examples include interfaces to ML [7], for transparent use of multi-core and GPUs [2,8], for distributed computing [9], to Python [10], and for data visualization.…”
Section: Data Formatmentioning
confidence: 99%
“…During the coming years, ROOT will continue to work on providing efficient and easy to use interfaces. Examples include interfaces to ML [7], for transparent use of multi-core and GPUs [2,8], for distributed computing [9], to Python [10], and for data visualization.…”
Section: Data Formatmentioning
confidence: 99%
“…The engine relies on the distributed RDataFrame Python package [28]. This is an extension of the RDataFrame interface that wraps the computations issued by the user in their application code in a MapReduce [29] pattern.…”
Section: A Overviewmentioning
confidence: 99%
“…Although podio and the EDMs are targeted more at the usage inside HEP software frameworks where the rich structure of objects and relations among them is a necessary feature, we would like to explore the possibilities of also supporting flat data formats where such information is much harder to represent. The ROOT backend writes TTrees that can be loaded into an RDataFrame [17], which can be used to gain some first experience and to collect some feedback on how the current EDM4hep can be used in such a context. We find that from a purely technical point of view it would already be possible to implement a usable analysis framework using RDataFrame on top of the usual file structure that is produced by podio.…”
Section: Support For Flat Data Formatsmentioning
confidence: 99%