The CMS DBS query language

Kuznetsov, Valentin; Riley, D.; Afaq, A.; Sekhri, V.; Guo, Yuyi; Lueking, L.

doi:10.1088/1742-6596/219/4/042043

Cited by 6 publications

(4 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, the ACID properties of the back-end can be replaced with BASE (basically available, soft state, eventually consistent) alternative, which is more suitable for this case. Even though we had prior experience with usability studies for data discovery service [15,16], we found that experiment requirements are constantly changing. Such changes were often required schema modifications as well as adaptation of SQL queries.…”

Section: Das and Mongodbmentioning

confidence: 99%

Life in extra dimensions of database world or penetration of NoSQL in HEP community

Kuznetsov¹,

Evans²,

Metson³

2012

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Abstract. The recent buzzword in IT world is NoSQL. Major players, such as Facebook, Yahoo, Google, etc. are widely adopted different "NoSQL" solutions for their needs. Horizontal scalability, flexible data model and management of big data volumes are only a few advantages of NoSQL. In CMS experiment we use several of them in production environment. Here, we present CMS projects based on NoSQL solutions, their strengths and weaknesses as well as our experience with those tools and their coexistence with standard RDBMS solutions in our applications. IntroductionNowadays the IT infrastructure of any large organization is quite complex by default. It usually consists of heterogeneous networks, it resides in different data centers and heavily utilizes database technology to hold corporate data. Recently a new trend, the so-called NoSQL [1] solutions, joins this conglomerate. More and more companies are adding them to solve raising problems with data scalability. The range of NoSQL solutions varies from key-value stores to graph databases. Even though their advantages are controversial, their penetration into IT infrastructure is obvious. The High Energy Physics community is not an exception. Today CERN attracts most of the attention. It hosts four large experiments at LHC searching for new physics and testing our knowledge of Standard Model. One of them is Compact Muon Solenoid (CMS) [2]. More than three thousand physicists are involved in most sophisticated research program at CMS that yields a few PB of data each year at their disposal. To handle this amount of data CMS experiment relies on hundreds of computing centers around the world interconnected with each other by GRID infrastructure [3]. The CMS software has been around ten years and still is constantly evolving along with experiment requirements. Today it has more than 4M lines of C++ code (CMSSW release), 2M lines of python code (framework configuration, web and data management stack), as well as various applications written in Java and perl (mostly web and data services). Major CMS data-services, such as Tier0 [4], PhEDEx [5], Data Bookkeeping System [6], Run Summary [7] are based on standard 3-tier architecture and rely on ORACLE RAC. Combined, they accumulate ∼O(100GB) of meta-data every month. Although such infrastructure served us well in first couple of years of data taking, the application growth, big data scale and changing requirements enforced CMS developers to bring NoSQL solutions into CMS software stack. Here we discuss reasons behind this choice and our experience with

show abstract

Section: Das and Mongodbmentioning

confidence: 99%

Life in extra dimensions of database world or penetration of NoSQL in HEP community

Kuznetsov¹,

Evans²,

Metson³

2012

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

show abstract

“…DAS queries are made in a custom text-based language. Work originally centred on an extended version of the query language already developed for the CMS Dataset Bookkeeping Service [6] but the syntax was found unsuitable. The DAS syntax broadly resembles using pipes and commands in a UNIX shell, albeit with entirely dissimilar implementation.…”

Section: Das Query Languagementioning

confidence: 99%

Data Aggregation System - a system for information retrieval on demand over relational and non-relational distributed data sources

Ball

Kuznetsov

Evans

et al. 2011

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

We present the Data Aggregation System, a system for information retrieval and aggregation from heterogenous sources of relational and non-relational data for the Compact Muon Solenoid experiment on the CERN Large Hadron Collider. The experiment currently has a number of organically-developed data sources, including front-ends to a number of different relational databases and non-database data services which do not share common data structures or APIs (Application Programming Interfaces), and cannot at this stage be readily converged. DAS provides a single interface for querying all these services, a caching layer to speed up access to expensive underlying calls and the ability to merge records from different data services pertaining to a single primary key.

show abstract

“…The data transfer and location services are handled by Rucio [6]. DBS [7] is the Data Bookkeeping Service, a metadata catalog. DAS [8], the Data Aggregation Service, is designed to aggregate views and provide them to users and services.…”

Section: Introductionmentioning

confidence: 99%

The CMS monitoring applications for LHC Run 3

Jashal,

Kuznetsov,

Legger

et al. 2024

EPJ Web of Conf.

Self Cite

View full text Add to dashboard Cite

Data taking at the Large Hadron Collider (LHC) at CERN restarted in 2022. The CMS experiment relies on a distributed computing infrastructure based on WLCG (Worldwide LHC Computing Grid) to support the LHC Run 3 physics program. The CMS computing infrastructure is highly heterogeneous and relies on a set of centrally provided services, such as distributed workload management and data management, and computing resources hosted at almost 150 sites worldwide. Smooth data taking and processing requires all computing subsystems to be fully operational, and available computing and storage resources need to be continuously monitored. During the long shutdown between LHC Run 2 and Run 3, the CMS monitoring infrastructure has undergone major changes to increase the coverage of monitored applications and services, while becoming more sustainable and easier to operate and maintain. The used technologies are based on open-source solutions, either provided by the CERN IT department through the MONIT infrastructure, or managed by the CMS monitoring team. Monitoring applications for distributed workload management, submission infrastructure based on HTCondor, distributed data management, facilities have been ported from mostly custom-built applications to use common data flow and visualization services. Data are mostly stored in non-SQL databases and storage technologies such as ElasticSearch, VictoriaMetrics, Prometheus, InfluxDB and HDFS, and accessed either via programmatic APIs, Apache Spark or Sqoop jobs, or visualized preferentially using Grafana. Most CMS monitoring applications are deployed on Kubernetes clusters to minimize maintenance operations. In this contribution we present the full stack of CMS monitoring services and show how we leveraged the use of common technologies to cover a variety of monitoring applications and cope with the computing challenges of LHC Run 3.

show abstract

The CMS DBS query language

Cited by 6 publications

References 2 publications

Life in extra dimensions of database world or penetration of NoSQL in HEP community

Life in extra dimensions of database world or penetration of NoSQL in HEP community

Data Aggregation System - a system for information retrieval on demand over relational and non-relational distributed data sources

The CMS monitoring applications for LHC Run 3

Contact Info

Product

Resources

About