Abstract-The intensive growth of processing power, data storage and transmission capabilities has revolutionized many aspects of science. These resources are essential to achieve highquality results in many application areas. In this context, the University of Luxembourg (UL) operates since 2007 an High Performance Computing (HPC) facility and the related storage by a very small team. The aspect of bridging computing and storage is a requirement of UL service -the reasons are both legal (certain data may not move) and performance related. Nowadays, people from the three faculties and/or the two Interdisciplinary centers within the UL, are users of this facility. More specifically, key research priorities such as Systems Bio-medicine (by LCSB) and Security, Reliability & Trust (by SnT) require access to such HPC facilities in order to function in an adequate environment. The management of HPC solutions is a complex enterprise and a constant area for discussion and improvement. The UL HPC facility and the derived deployed services is a complex computing system to manage by its scale: at the moment of writing, it consists of 150 servers, 368 nodes (3880 computing cores) and 1996 TB of shared storage which are all configured, monitored and operated by only three persons using advanced IT automation solutions based on Puppet [1], FAI [2] and Capistrano [3]. This paper covers all the aspects in relation to the management of such a complex infrastructure, whether technical or administrative. Most design choices or implemented approaches have been motivated by several years of experience in addressing research needs, mainly in the HPC area but also in complementary services (typically Web-based). In this context, we tried to answer in a flexible and convenient way many technological issues. This experience report may be of interest for other research centers and universities belonging either to the public or the private sector looking for good if not best practices in cluster architecture and management.
SUMMARYVirtualization is emerging as the prominent approach to mutualise the energy consumed by a single server running multiple Virtual Machines (VMs) instances. The efficient utilization of virtualized servers and/or computing resources requires understanding of the overheads in energy consumption and the throughput, especially on high-demanding High Performance Computing (HPC) platforms. In this paper, a novel holistic model for the power of virtualized computing nodes is proposed. Moreover, we create and validate instances of the proposed model using concrete measures taken during a benchmarking process that reflects an HPC usage, i.e. HPCC, IOZone and Bonnie++, conducted using two different hardware configurations on Grid5000 platform, based on Intel and AMD processors, and three widespread virtualization frameworks, namely Xen, KVM, and VMware ESXi. The proposed holistic model of machine power takes into account the impact of utilisation metrics of the machine's components, as well as the employed application, virtualization, and hardware. The model is further derived using tools such as multiple linear regressions or neural networks that prove its elasticity, applicability and accuracy. The purpose of the model is to enable the estimation of energy consumption of virtualized platforms, aiming to make possible the optimization, scheduling or accounting in such systems, or their simulation.
With a growing concern on the considerable energy consumed by HPC platforms and data centers, research efforts are targeting green approaches with higher energy efficiency. In particular, virtualization is emerging as the prominent approach to mutualize the energy consumed by a single server running multiple VMs instances. Even today, it remains unclear whether the overhead induced by virtualization and the corresponding hypervisor middleware suits an environment as high-demanding as an HPC platform. In this paper, we analyze from an HPC perspective the three most widespread virtualization frameworks, namely Xen, KVM, and VMware ESXi and compare them with a baseline environment running in native mode. We performed our experiments on the Grid'5000 platform by measuring the results of the reference HPL benchmark. Power measures were also performed in parallel to quantify the potential energy efficiency of the virtualized environments. In general, our study offers novel incentives toward in-house HPC platforms running without any virtualized frameworks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.