Popularity framework to process dataset traces and its application on dynamic replica reduction in the ATLAS experiment

Molfetas, A.; Megino, Fernando Barreiro; Tykhonov, A.; Lassnig, M.; Garonne, V.; Barisits, Martin; Dimitrov, Gancho; Jézéquel, S.; Ueda, I.; Viegas, F. Tique Aires

doi:10.1088/1742-6596/331/6/062018

Cited by 12 publications

(14 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These datasets are produced from the RAW dataset that was the most popular among RAW datasets based on information from DQ2 Popularity service on September 2011 [3]: "data11_7TeV.00184130.physics_JetTauEtmiss.merge.RAW". The obtained data will help to track the behavior of derived datasets usage.…”

Section: Evaluation Of Panda Datamentioning

confidence: 99%

See 1 more Smart Citation

A Probabilistic Analysis of Data Popularity in ATLAS Data Caching

Titov

Záruba

Klimentov

et al. 2012

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Abstract.One of the most important aspects in any computing distribution system is efficient data replication over storage or computing centers, that guarantees high data availability and low cost for resource utilization. In this paper we propose a data distribution scheme for the production and distributed analysis system PanDA at the ATLAS experiment. Our proposed scheme is based on the investigation of data usage. Thus, the paper is focused on the main concepts of data popularity in the PanDA system and their utilization. Data popularity is represented as the set of parameters that are used to predict the future data state in terms of popularity levels.

show abstract

Section: Evaluation Of Panda Datamentioning

confidence: 99%

“…Data deletion is also demand driven (by the Replica Reduction Agent), reducing the numbers of replicas for unpopular data to get space for more popular data [3]. This dynamic model has led to substantial improvements in efficient utilization of storage and processing resources [4].…”

Section: Introductionmentioning

confidence: 99%

A Probabilistic Analysis of Data Popularity in ATLAS Data Caching

Titov

Záruba

Klimentov

et al. 2012

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

show abstract

“…The popularity service described in reference [9] uses the recorded trace messages to analyse the popular and unpopular files, and support dynamic replica reduction.…”

Section: Popularity and Dynamic Deletion Of Unpopular Replicamentioning

confidence: 99%

The ATLAS DDM Tracer monitoring framework

Zang¹,

Garonne²,

Barisits³

et al. 2012

J. Phys.: Conf. Ser.

Self Cite

View full text Add to dashboard Cite

“…The strategy was first to distribute the minimal replicas as 'primary' and some extra as 'secondary' that are foreseen to be used, adding more 'secondary' replicas following the usage and needs, and removing unused 'secondary' replicas to ensure enough free space for further prompt replication, especially for new data or popular datasets. A system to measure data set popularity was established that recorded the number of accesses per dataset and per file from different activities, and another system for auto-cleaning that selects dataset replicas to be deleted based on the popularity accounting was developed for that purpose [8,10].…”

Section: Data Distribution Over the Gridmentioning

confidence: 99%

ATLAS Distributed Computing Operations in the First Two Years of Data Taking

Ueda¹

2012

Proceedings of the International Symposium on Grids and Clouds (ISGC) 2012 — PoS(ISGC 2012)

Self Cite

View full text Add to dashboard Cite

The ATLAS experiment has had two years of steady data taking in 2010 and 2011. Data are calibrated, reconstructed, distributed and analysed at over 100 different sites using the Worldwide LHC Computing Grid. Following the experience in 2010, the data distribution policies were revised to address scalability issues due to the increase in luminosity and trigger rate in 2011. The structure in the ATLAS computing model has also been revised to optimise the usage of the resources, according to effective transfer rates between sites and site availability. Some new infrastructures were introduced for the software installation at the sites and for database access to reduce the bottlenecks in the data processing. Issues in the end-user analysis were studied and automated control system of the analysis queues based on functional tests has been introduced. The monitoring and accounting tools have been developed and provide views of the ATLAS activities by categories. In this talk, we will report on the operational experience and evolution in the ATLAS Distributed Computing and on the system performance during the first two years of operation.

show abstract

Popularity framework to process dataset traces and its application on dynamic replica reduction in the ATLAS experiment

Cited by 12 publications

References 2 publications

A Probabilistic Analysis of Data Popularity in ATLAS Data Caching

A Probabilistic Analysis of Data Popularity in ATLAS Data Caching

The ATLAS DDM Tracer monitoring framework

ATLAS Distributed Computing Operations in the First Two Years of Data Taking

Contact Info

Product

Resources

About