2019
DOI: 10.1051/epjconf/201921404057
|View full text |Cite
|
Sign up to set email alerts
|

A prototype for the evolution of ATLAS EventIndex based on Apache Kudu storage

Abstract: The ATLAS EventIndex has been in operation since the beginning of LHC Run 2 in 2015. Like all software projects, its components have been constantly evolving and improving in performance. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and relational database features, which look promising at the design level. This environment is used to build a prototype t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…Investigations on several structured storage formats for the main EventIndex data to replace the Hadoop MapFiles started a few years ago [17]. Initially it looked like Apache Kudu [18] would be a good solution, joining BigData storage performance with SQL query capabilities [19]. Unfortunately Kudu did not get a large support in the open-source community and CERN decided not to invest hardware and manpower resources in this technology.…”
Section: System Design Evolutionmentioning
confidence: 99%
“…Investigations on several structured storage formats for the main EventIndex data to replace the Hadoop MapFiles started a few years ago [17]. Initially it looked like Apache Kudu [18] would be a good solution, joining BigData storage performance with SQL query capabilities [19]. Unfortunately Kudu did not get a large support in the open-source community and CERN decided not to invest hardware and manpower resources in this technology.…”
Section: System Design Evolutionmentioning
confidence: 99%
“…This improves performance and alsoavailability, as data is also replicated in several tablets on different servers. A careful key and partition schema definition for the EventIndex is being designed [5], with the aim to allow that row entries that define different reprocessings of the same data sit in the same partition and close in disk storage. This fact improves navigation through locality, and also benefits from better compression ratios of the same data due to this locality.…”
Section: New Backend Solution: Kudumentioning
confidence: 99%
“…More details about the implementation on Kudu and the tests that we have been performing related with the Data Collection task can be seen in Ref. [5].…”
Section: New Backend Solution: Kudumentioning
confidence: 99%
“…Investigations on several structured storage formats for the main EventIndex data to replace the Hadoop MapFiles [4] used till now started a few years ago [5]. Initially it looked like Apache Kudu [6] would be a good solution, as it joins BigData storage performance with SQL query capabilities [7]. Unfortunately Kudu did not get a sufficiently large support in the open-source community and CERN decided not to invest hardware and human resources in this technology.…”
Section: Introductionmentioning
confidence: 99%
“…Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science andEducation" (GRID'2021), Dubna, Russia, July[5][6][7][8][9] 2021 …”
mentioning
confidence: 99%