Virtual machine (VM) consolidation has become a common practice in clouds, Grids, and datacenters. While this practice leads to higher CPU utilization, we observe its negative impact on the TCP throughput of the consolidated VMs: As more VMs share the same core/CPU, the CPU scheduling latency for each VM increases significantly. Such increase leads to slower progress of TCP transmissions to the VMs. To address this problem, we propose an approach called vSnoop, where the driver domain of a host acknowledges TCP packets on behalf of the guest VMs-whenever it is safe to do so. Our evaluation of a Xen-based prototype indicates that vSnoop constantly achieves TCP throughput improvement for VMs (of orders of magnitude in some scenarios). We further show that the higher TCP throughput leads to improvement in applicationlevel performance, via experiments with a two-tier online auction application and two suites of MPI benchmarks.
A virtual networked environment (VNE) consists of virtual machines (VMs) connected by a virtual network. It has been adopted to create "virtual infrastructures" for individual users on a shared cloud computing infrastructure. The ability to take snapshots of an entire VNE -including images of the VMs with their execution, communication and storage states -yields a unique approach to reliability as a snapshot can restore the operation of an entire virtual infrastructure. We present VNsnap, a system that takes distributed snapshots of VNEs. Unlike existing distributed snapshot/checkpointing solutions, VNsnap does not require any modifications to the applications, libraries, or (guest) operating systems running in the VMs. Furthermore, VNsnap incurs only seconds of downtime as much of the snapshot operation takes place concurrently with the VNE's normal operation. We have implemented VNsnap on top of Xen. Our experiments with real-world parallel and distributed applications demonstrate VNsnap's effectiveness and efficiency.
Virtualization is a key technology that powers cloud computing platforms such as Amazon EC2. Virtual machine (VM) consolidation, where multiple VMs share a physical host, has seen rapid adoption in practice with increasingly large number of VMs per machine and per CPU core. Our investigations, however, suggest that the increasing degree of VM consolidation has serious negative effects on the VMs' TCP transport performance. As multiple VMs share a given CPU, the scheduling latencies, which can be in the order of tens of milliseconds, substantially increase the typically submillisecond round-trip times (RTTs) for TCP connections in a datacenter, causing significant degradation in throughput. In this paper, we propose a light-weight solution called vFlood that (a) allows a TCP sender VM to opportunistically flood the driver domain in the same host, and (b) offloads the VM's TCP congestion control function to the driver domain in order to mask the effects of VM consolidation. Our evaluation of a vFlood prototype on Xen suggests that vFlood substantially improves TCP transmit throughput with minimal per-packet CPU overhead. Further, our application-level evaluation using Apache Olio, a web 2.0 cloud application, indicates a 33% improvement in the number of operations per second.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.