Energy proportionality and workload consolidation are important objectives towards increasing efficiency in largescale datacenters. Our work focuses on achieving these goals in the presence of applications with µs-scale tail latency requirements. Such applications represent a growing subset of datacenter workloads and are typically deployed on dedicated servers, which is the simplest way to ensure low tail latency across all loads. Unfortunately, it also leads to low energy efficiency and low resource utilization during the frequent periods of medium or low load.We present the OS mechanisms and dynamic control needed to adjust core allocation and voltage/frequency settings based on the measured delays for latency-critical workloads. This allows for energy proportionality and frees the maximum amount of resources per server for other background applications, while respecting service-level objectives. Monitoring hardware queue depths allows us to detect increases in queuing latencies. Carefully coordinated adjustments to the NIC's packet redirection table enable us to reassign flow groups between the threads of a latency-critical application in milliseconds without dropping or reordering packets. We compare the efficiency of our solution to the Pareto-optimal frontier of 224 distinct static configurations. Dynamic resource control saves 44%-54% of processor energy, which corresponds to 85%-93% of the Pareto-optimal upper bound. Dynamic resource control also allows background jobs to run at 32%-46% of their standalone throughput, which corresponds to 82%-92% of the Pareto bound.
Datacenter-networking research requires tools to both generate traffic and accurately measure latency and throughput. While hardwarebased tools have long existed commercially, they are primarily used to validate ASICs and lack flexibility, e.g., to study new protocols. They are also too expensive for academics. The recent development of kernel-bypass networking and advanced NIC features such as hardware timestamping have created new opportunities for accurate latency measurements. This paper compares these two approaches, and in particular whether commodity servers and NICs, when properly configured, can measure the latency distributions as precisely as specialized hardware.Our work shows that well-designed commodity solutions can capture subtle differences in the tail latency of stateless UDP traffic. We use hardware devices as the ground truth, both to measure latency and to forward traffic. We compare the ground truth with observations that combine five latency-measuring clients and five different port forwarding solutions and configurations. State-of-theart software such as MoonGen that uses NIC hardware timestamping provides sufficient visibility into tail latencies to study the effect of subtle operating system configuration changes. We also observe that the kernel-bypass-based TRex software, that only relies on the CPU to timestamp traffic, can also provide solid results when NIC timestamps are not available for a particular protocol or device.
The conventional wisdom is that aggressive networking requirements, such as high packet rates for small messages and μs-scale tail latency, are best addressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performance and high resource efficiency while maintaining the protection and isolation benefits of existing kernels.IX uses hardware virtualization to separate management and scheduling functions of the kernel (control plane) from network processing (dataplane). The dataplane architecture builds upon a native, zero-copy API and optimizes for both bandwidth and latency by dedicating hardware threads and networking queues to dataplane instances, processing bounded batches of packets to completion, and eliminating coherence traffic and multicore synchronization. The control plane dynamically adjusts core allocations and voltage/frequency settings to meet service-level objectives.We demonstrate that IX outperforms Linux and a user-space network stack significantly in both throughput and end-to-end latency. Moreover, IX improves the throughput of a widely deployed, key-value store by up to 6.4× and reduces tail latency by more than 2×. With three varying load patterns, the control plane saves 46%-54% of processor energy, and it allows background jobs to run at 35%-47% of their standalone throughput.
We consider the problem of coordination among replicated SDN controllers, where the challenge is to ensure a consistent view of the network while reacting to network events in a prompt manner. Existing solutions are either consensus-based, which achieve consistency at the expense of high latency; or eventual-consistency-based, which have low latency at the expense of severe limitations on the types of applications and policies implementable by the controller. We propose the Fast and Consistent Controller-Replication (FCR) scheme. FCR is based on a deterministic agreement mechanism that performs agreement on the input of controllers, instead of agreement on the output as done in consensus mechanisms. We formally prove that FCR provides the same guarantees in terms of implementable applications and network policies, as any deterministic single-image controller. Through simulation and implementation, we show that these guarantees can be implemented with little latency overhead, compared to eventualconsistency approaches, and can be achieved significantly faster than consensus-based approaches. INDEX TERMS Software defined networking, control plane consistency, latency overhead, replicated SDN controllers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.