Containers became the de-facto standard to package and distribute modern applications and their dependencies. The HEP community demonstrates an increasing interest in such technology, with scientists encapsulating their analysis workflow and code inside a container image. The analysis is first validated on a small dataset and minimal hardware resources to then run at scale on the massive computing capacity provided by the grid. The typical approach for distributing containers consists of pulling their image from a remote registry and extracting it on the node where the container runtime (e.g., Docker, Singularity) runs. This approach, however, does not easily scale to large images and thousands of nodes. CVMFS has long been used for the efficient distribution of software directory trees at a global scale. In order to extend its optimized caching and network utilization to the distribution of containers, CVMFS recently implemented a dedicated container image ingestion service together with container runtime integrations. CVMFS ingestion is based on per-file deduplication, instead of the per-layer deduplication adopted by traditional container registries. On the client-side, CVMFS implements on-demand fetching of the chunks required for the execution of the container instead of the whole image.
The CernVM File System (CernVM-FS) is a global read-only POSIX file system that provides scalable and reliable software distribution to numerous scientific collaborations. It gives access to more than a billion binary files of experiment application software stacks and operating system containers to end user devices, grids, clouds, and supercomputers. CernVM-FS is asymmetric by construction. Writing into the repository is a centralized operation called publishing, while reading is allowed for many clients from many locations. The classic publishing process needs a dedicated “release manager machine” that provides the editable repository copy. This classic approach was improved thanks to the introduction of the CernVM-FS Gateway that provides concurrent access to the repository backend storage through a REST API. In this contribution, we present further improvements to the CernVM-FS publishing process. Our main contribution is the construction of ephemeral containers that are created on demand and used to provide a temporary, editable repository copy for a single publish operation. The container construction makes careful use of Linux namespaces and a user-space implementation of overlayfs. We further show that both the gateway and the containers used for publishing can be instantiated as pods in a kubernetes cluster. Thus, we demonstrate a kubernetes-native CernVM-FS publishing workflow.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.