Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.
The Compact Muon Solenoid (CMS) experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and developers. The release cycle is composed of several steps, from building RPMs to their deployment, validation, and integration tests. To enhance the sustainability of the CMSWEB cluster, CMS decided to migrate its cluster to a containerized solution based on Docker and orchestrated with Kubernetes (K8s). This allows us to significantly speed up the release upgrade cycle, follow the end-to-end deployment procedure, and reduce operational cost. In this paper, we give an overview of the CMSWEB VM cluster and the issues we discovered during this migration. We discuss the architecture and the implementation strategy in the CMSWEB Kubernetes cluster. Even though Kubernetes provides horizontal pod autoscaling based on CPUs and memory, in this paper, we provide details of horizontal pod autoscaling based on the custom metrics of CMSWEB services. We also discuss automated deployment procedure based on the best practices of continuous integration/continuous deployment (CI/CD) workflows. We present performance analysis between Kubernetes and VM based CMSWEB deployments. Finally, we describe various issues found during the implementation in Kubernetes and report on lessons learned during the migration process.
The CMS experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and developers. The release cycle is composed of several steps, from building RPMs, their deployment to perform validation, and integration tests. To enhance the sustainability of the CMSWEB cluster, CMS decided to migrate its cluster to a containerized solution such as Docker, orchestrated with Kubernetes (k8s). This allows us to significantly reduce the release upgrade cycle, follow the end-to-end deployment procedure, and reduce operational cost. This paper gives an overview of the current CMSWEB cluster and its issues. We describe the new architecture of the CMSWEB cluster in Kubernetes. We also provide a comparison of VM and Kubernetes deployment approaches and report on lessons learned during the migration process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.