The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal resource management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container technologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that scientific workloads for both Docker and Singularity based containers can achieve near-native performance.Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarsegrained resource allocation. • Hardware → Networking hardware; • Software and its engineering → Application specific development environments; • General and reference → Performance; KEYWORDS Docker, Singularity, scientific workloads ACM Reference Format: P. Saha et al.As there is support for Docker with these technologies, containerized HPC applications can also avail features such as container migration, resource fairness, and fault tolerance.Singularity is designed to use the underlying HPC runtime environment for executing MPI applications, whereas Docker is designed to isolate the runtime environment from the host. Also, Singularity focuses on coarse-grained resource allocation whereas Docker can take advantage of the fine-grained allocation of resources per rank.HPC centers and academic clusters currently do not widely support Docker due to reports of security concerns that root escalation is possible. However, this vulnerability is not a concern in cloud allocations wherein users have root privileges to run their applications and other security modules provide...
Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among multiple users through Dominant Resource Fairness (DRF) based allocation. DRF takes into account different resource types (CPU, Memory, Disk I/O) requested by each application and determines the share of each cluster resource that could be allocated to the applications.Mesos has adopted a two-level scheduling policy: (1) DRF to allocate resources to competing frameworks and (2) task level scheduling by each framework for the resources allocated during the previous step. We have conducted experiments in a local Mesos cluster when used with frameworks such as Apache Aurora, Marathon, and our own framework Scylla, to study resource fairness and cluster utilization.Experimental results show how informed decision regarding second level scheduling policy of frameworks and attributes like offer holding period, offer refusal cycle and task arrival rate can reduce unfair resource distribution. Bin-Packing scheduling policy on Scylla with Marathon can reduce unfair allocation from 38% to 3%. By reducing unused free resources in offers we bring down the unfairness from to 90% to 28%. We also show the effect of task arrival rate to reduce the unfairness from 23% to 7%. 1
SUMMARYScience Gateways provide scientists with tools for creating, executing, and monitoring scientific experiments on multiple resource infrastructures. Apache Airavata abstracts interactions between gateways and distributed computing infrastructures like Extreme Science and Engineering Discovery Environment, international grids, and campus clusters. Airavata consists of several component services such as the API server, Orchestrator, Workflow Interpreter, Credential Store, and Application Factory. In addition, Airavata uses third party software, including RabbbitMQ for messaging, MySQL for production database management, and Apache Zookeeper for internal communications. In this paper, we discuss our initial experiences with leveraging open source technologies to manage Airavata and its dependent components to deploy, detect, and restart failed components in an auto-scaling platform. Such capabilities will allow Airavata services to be deployed in a wide area, large Virtual Machine (VM) based cluster, and a developer's laptop. The emerging technologies in cloud computing and Big Data that address these needs are the following: Docker, Marathon, and Apache Mesos. Docker is a Linux-based lightweight container that allows different applications to run isolated from each other but safely share the machine's resources. Docker images of applications can be published in registries and retrieved for execution in the target infrastructures. Marathon provides a cluster-wide init and control system for services, including Docker containers. Mesos provides a cluster-wide framework to schedule tasks based on fine-grained resource needs. Mesosphere provides the packages, scripts, and web interface to ease the use of these technologies. We present the design, experience, and lessons learned from integrating Mesos, Docker, and Marathon with Apache Airavata.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.