2021
DOI: 10.1051/epjconf/202125102061
|View full text |Cite
|
Sign up to set email alerts
|

Coffea-casa: an analysis facility prototype

Abstract: Data analysis in HEP has often relied on batch systems and event loops; users are given a non-interactive interface to computing resources and consider data event-by-event. The “Coffea-casa” prototype analysis facility is an effort to provide users with alternate mechanisms to access computing resources and enable new programming paradigms. Instead of the command-line interface and asynchronous batch access, a notebook-based web interface and interactive computing is provided. Instead of writing event loops, t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Data delivery is less generic, in that HEP datasets have specialized formats, considerable tooling, and optimizable properties, such as statistically independent events and the columnar layouts of TTrees. Three IRIS-HEP projects, namely ServiceX [31], SkyhookDM [32], and coffea-casa [40], use generic data science tools to build HEP-specific workflows. These are good examples of the "mixed future," in which Docker Kubernetes, Helm, Minio, Flask, RabbitMQ, Kafka, Ceph, and Gandiva are used alongside ROOT, Rucio, XCache, and Uproot to deliver columns of data to analyses as Arrow or Awkward Array buffers, Parquet or ROOT files.…”
Section: Distributed Computingmentioning
confidence: 99%
“…Data delivery is less generic, in that HEP datasets have specialized formats, considerable tooling, and optimizable properties, such as statistically independent events and the columnar layouts of TTrees. Three IRIS-HEP projects, namely ServiceX [31], SkyhookDM [32], and coffea-casa [40], use generic data science tools to build HEP-specific workflows. These are good examples of the "mixed future," in which Docker Kubernetes, Helm, Minio, Flask, RabbitMQ, Kafka, Ceph, and Gandiva are used alongside ROOT, Rucio, XCache, and Uproot to deliver columns of data to analyses as Arrow or Awkward Array buffers, Parquet or ROOT files.…”
Section: Distributed Computingmentioning
confidence: 99%
“…It is being packaged in a way that it can be deployed on clusters outside of Nebraska. Further explanation of the concepts and demonstrations of the facility can be found in a paper for the CHEP 2021 conference [21].…”
Section: University Of Nebraskamentioning
confidence: 99%
“…Usage of Dask has begun only recently, also brought by the increased popularity in HEP of Pythonbased interfaces. In particular, it is being explored in the context of the so-called analysis facilities, where different tools are unified in a coherent software stack that can fulfill all of physicists' analysis needs [32]. In this regard, a key feature of Dask is provided by its interfaces with batch computing systems, in particular HTCondor, widely used in HEP computing clusters.…”
Section: Related Workmentioning
confidence: 99%