Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Abstract-The total dataset produced by the BaBar experiment at the Stanford Linear Accelerator Center (SLAC) currently comprises roughly 3 10 9 data events and an equal amount of simulated events, corresponding to 23 Tbytes of real data and 51 Tbytes simulated events. Since individual analyses typically select a very small fraction of all events, it would be extremely inefficient if each analysis had to process the full dataset. A first, centrally managed analysis step is therefore a common pre-selection ('skimming') of all data according to very loose, inclusive criteria to facilitate data access for later analysis. Usually, there are common selection criteria for several analysis. However, they may change over time, e.g., when new analyses are developed. Currently, (100) such pre-selection streams ('skims') are defined. In order to provide timely access to newly created or modified skims, it is necessary to process the complete dataset several times a year. Additionally, newly taken or simulated data has to be skimmed as it becomes available. The system currently deployed for skim production is using 1800 CPUs distributed over three production sites. It was possible to process the complete dataset within about 3.5 months. We report on the stability and the performance of the system.
Abstract-The total dataset produced by the BaBar experiment at the Stanford Linear Accelerator Center (SLAC) currently comprises roughly 3 10 9 data events and an equal amount of simulated events, corresponding to 23 Tbytes of real data and 51 Tbytes simulated events. Since individual analyses typically select a very small fraction of all events, it would be extremely inefficient if each analysis had to process the full dataset. A first, centrally managed analysis step is therefore a common pre-selection ('skimming') of all data according to very loose, inclusive criteria to facilitate data access for later analysis. Usually, there are common selection criteria for several analysis. However, they may change over time, e.g., when new analyses are developed. Currently, (100) such pre-selection streams ('skims') are defined. In order to provide timely access to newly created or modified skims, it is necessary to process the complete dataset several times a year. Additionally, newly taken or simulated data has to be skimmed as it becomes available. The system currently deployed for skim production is using 1800 CPUs distributed over three production sites. It was possible to process the complete dataset within about 3.5 months. We report on the stability and the performance of the system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.