Abstract. The recent buzzword in IT world is NoSQL. Major players, such as Facebook, Yahoo, Google, etc. are widely adopted different "NoSQL" solutions for their needs. Horizontal scalability, flexible data model and management of big data volumes are only a few advantages of NoSQL. In CMS experiment we use several of them in production environment. Here, we present CMS projects based on NoSQL solutions, their strengths and weaknesses as well as our experience with those tools and their coexistence with standard RDBMS solutions in our applications.
IntroductionNowadays the IT infrastructure of any large organization is quite complex by default. It usually consists of heterogeneous networks, it resides in different data centers and heavily utilizes database technology to hold corporate data. Recently a new trend, the so-called NoSQL [1] solutions, joins this conglomerate. More and more companies are adding them to solve raising problems with data scalability. The range of NoSQL solutions varies from key-value stores to graph databases. Even though their advantages are controversial, their penetration into IT infrastructure is obvious. The High Energy Physics community is not an exception. Today CERN attracts most of the attention. It hosts four large experiments at LHC searching for new physics and testing our knowledge of Standard Model. One of them is Compact Muon Solenoid (CMS) [2]. More than three thousand physicists are involved in most sophisticated research program at CMS that yields a few PB of data each year at their disposal. To handle this amount of data CMS experiment relies on hundreds of computing centers around the world interconnected with each other by GRID infrastructure [3]. The CMS software has been around ten years and still is constantly evolving along with experiment requirements. Today it has more than 4M lines of C++ code (CMSSW release), 2M lines of python code (framework configuration, web and data management stack), as well as various applications written in Java and perl (mostly web and data services). Major CMS data-services, such as Tier0 [4], PhEDEx [5], Data Bookkeeping System [6], Run Summary [7] are based on standard 3-tier architecture and rely on ORACLE RAC. Combined, they accumulate ∼O(100GB) of meta-data every month. Although such infrastructure served us well in first couple of years of data taking, the application growth, big data scale and changing requirements enforced CMS developers to bring NoSQL solutions into CMS software stack. Here we discuss reasons behind this choice and our experience with