Abstract. The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling. This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources.
The CERN IT Storage Group ensures the symbiotic development and operations of storage and data transfer services for all CERN physics data, in particular the data generated by the four LHC experiments (ALICE, ATLAS, CMS and LHCb).
In order to accomplish the objectives of the next run of the LHC (Run-3), the Storage Group has undertaken a thorough analysis of the experiments’ requirements, matching them to the appropriate storage and data transfer solutions, and undergoing a rigorous programme of testing to identify and solve any issues before the start of Run-3.
In this paper, we present the main challenges presented by each of the four LHC experiments. We describe their workflows, in particular how they communicate with and use the key components provided by the Storage Group: the EOS disk storage system; its archival back-end, the CERN Tape Archive (CTA); and the File Transfer Service (FTS). We also describe the validation and commissioning tests that have been undertaken and challenges overcome: the ATLAS stress tests to push their DAQ system to its limits; the CMS migration from PhEDEx to Rucio, followed by large-scale tests between EOS and CTA with the new FTS “archive monitoring” feature; the LHCb Tier-0 to Tier-1 staging tests and XRootD Third Party Copy (TPC) validation; and the erasure coding performance in ALICE.
Modern scientific experiments collect vast amounts of data that must be catalogued to meet multiple use cases and search criteria. In particular, high-energy physics experiments currently in operation produce several billion events per year. A database with the references to the files including each event in every stage of processing is necessary in order to retrieve the selected events from data storage systems. The ATLAS EventIndex project is developing a way to store the necessary information using modern data storage technologies that allow saving in memory key-value pairs and select the best tools to support this application from the point of view of performance, robustness and ease of use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.