The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

Kuznetsov, Valentin; Fischer, Nils; Guo, Yuyi

doi:10.1007/s41781-018-0005-0

Cited by 3 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We found that the Spark platform significantly improved our analytics capabilities. For instance, in the WMArchive [12] system we can promptly perform the following tasks:…”

Section: Cms Monitoringmentioning

confidence: 99%

“…log files are usually streamed to HDFS in native data-format (JSON), the database tables are easily converted into CSV data-format, while large unstructured data sets, e.g. in case of the WMArchive system [12], are converted into compact, fast, binary Avro dataformat with a pre-defined schema. Fortunately, the HDFS libraries support a broad variety of HTCondor logs [8] JSON 11.1 TB AAA (Global Data Access) logs [9] JSON 11 TB EOS logs [10] JSON 5.3 TB FTS (File Transfer System) logs [11] JSON 4.2 TB PhEDEx snapshots [4] CSV 3.3 TB WMArchive logs [12] Avro 1.3 TB CMSSW (CMS SoftWare framework) logs Avro 0.5 TB DBS tables [4] CSV 0.3 TB JobMonitoring logs Avro 0.2 TB data-formats, and the Spark framework is guaranteed to work seamlessly and efficiently with all of them.…”

Section: Current Landscapementioning

confidence: 99%

“…in case of the WMArchive system [12], are converted into compact, fast, binary Avro dataformat with a pre-defined schema. Fortunately, the HDFS libraries support a broad variety of HTCondor logs [8] JSON 11.1 TB AAA (Global Data Access) logs [9] JSON 11 TB EOS logs [10] JSON 5.3 TB FTS (File Transfer System) logs [11] JSON 4.2 TB PhEDEx snapshots [4] CSV 3.3 TB WMArchive logs [12] Avro 1.3 TB CMSSW (CMS SoftWare framework) logs Avro 0.5 TB DBS tables [4] CSV 0.3 TB JobMonitoring logs Avro 0.2 TB data-formats, and the Spark framework is guaranteed to work seamlessly and efficiently with all of them. Such availability of large datasets and efficient processing on Hadoop clusters open up new possibilities to push the boundaries of analytics tasks beyond traditional approaches based on relational databases.…”

Section: Current Landscapementioning

confidence: 99%

See 2 more Smart Citations

Gaining insight from large data volumes with ease

Kuznetsov

2019

EPJ Web Conf.

Self Cite

View full text Add to dashboard Cite

Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by the desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction, various optimization of data workflows, and pipelines). In this paper, we discuss how modern technologies are transforming well established patterns in HEP communities. The new data insight can be achieved by embracing Big Data tools for a variety of use-cases, from analytics and monitoring to training Machine Learning models on a terabyte scale. We provide concrete examples within the context of the CMS experiment where Big Data tools are already playing or would play a significant role in daily operations. *

show abstract

“…We found that the Spark platform significantly improved our analytics capabilities. For instance, in the WMArchive [12] system we can promptly perform the following tasks:…”

Section: Cms Monitoringmentioning

confidence: 99%

Section: Current Landscapementioning

confidence: 99%

Section: Current Landscapementioning

confidence: 99%

See 1 more Smart Citation

Gaining insight from large data volumes with ease

Kuznetsov

2019

EPJ Web Conf.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Several models to predict the operator's action based on this input have been studied in the last years [2]. Additionally, for each thrown error code a snippet of the error log that contains the occurred exception is stored by the CMS WMArchive service [3]. The WMArchive entries are analyzed with Apache Spark on the CERN SWAN platform for interactive computing [5].…”

Section: Introductionmentioning

confidence: 99%

Automatic log analysis with NLP for the CMS workflow handling [Slides]

Abercrombie

Bakhshiansohi

Agarwal

et al. 2019

Automatic Log Analysis With NLP for the CMS Workflow Handling [Slides]

View full text Add to dashboard Cite

The central Monte-Carlo production of the CMS experiment utilizes the WLCG infrastructure and manages daily thousands of tasks, each up to thousands of jobs. The distributed computing system is bound to sustain a certain rate of failures of various types, which are currently handled by computing operators a posteriori. Within the context of computing operations, and operation intelligence, we propose a machine learning technique to learn from the operators with a view to reduce the operational workload and delays. This work is in continuation of CMS work on operation intelligence to try and reach accurate predictions with machine learning. We present an approach to consider the log files of the workflows as regular text to leverage modern techniques from natural language processing (NLP). In general, log files contain a substantial amount of text that is not human language. Therefore, different log parsing approaches are studied in order to map the log files words to high dimensional vectors. These vectors are then exploited as feature space to train a model that predicts the action that the operator has to take. This approach has the advantage that the information of the log files is extracted automatically and the format of the logs can be arbitrary. In this work the performance of the log file analysis with NLP is presented and compared to previous approaches.

show abstract

Automatic log analysis with NLP for the CMS workflow handling

et al. 2020

View full text Add to dashboard Cite

The central Monte-Carlo production of the CMS experiment utilizes the WLCG infrastructure and manages daily thousands of tasks, each up to thousands of jobs. The distributed computing system is bound to sustain a certain rate of failures of various types, which are currently handled by computing operators a posteriori. Within the context of computing operations, and operation intelligence, we propose a Machine Learning technique to learn from the operators with a view to reduce the operational workload and delays. This work is in continuation of CMS work on operation intelligence to try and reach accurate predictions with Machine Learning. We present an approach to consider the log files of the workflows as regular text to leverage modern techniques from Natural Language Processing (NLP). In general, log files contain a substantial amount of text that is not human language. Therefore, different log parsing approaches are studied in order to map the log files’ words to high dimensional vectors. These vectors are then exploited as feature space to train a model that predicts the action that the operator has to take. This approach has the advantage that the information of the log files is extracted automatically and the format of the logs can be arbitrary. In this work the performance of the log file analysis with NLP is presented and compared to previous approaches.

show abstract

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

Cited by 3 publications

References 6 publications

Gaining insight from large data volumes with ease

Gaining insight from large data volumes with ease

Automatic log analysis with NLP for the CMS workflow handling [Slides]

Automatic log analysis with NLP for the CMS workflow handling

Contact Info

Product

Resources

About