No abstract
Heterogeneous applications and restricted bandwidth on the Internet have motivated recent works on scavenger congestion control, which yields bandwidth to competing primary traffic for increased network-wide utility. Although potential use cases are quite common, deployments are as yet limited, in part due to protocol design immaturity, lack of open source code, and limited experimental evaluation. In this work, we extend recent scavenger advances by providing(1) open-source implementations of two recent scavenger proposals, PCC Proteus (QUIC-based) and LEDBAT++; (2) early benchmarks of the two in a realistic network setup; and (3) a discussion of APIs needed for applications to take advantage of scavenger congestion control. Ultimately, we hope this line of work will lead to further community discussion, open development, and deployment of scavengers yielding better quality of experience for users. CCS CONCEPTS• Networks → Transport protocols.
Inferring the root cause of failures among thousands of components in a data center network is challenging, especially for "gray" failures that are not reported directly by switches. Faults can be localized through end-to-end measurements, but past localization schemes are either too slow for large-scale networks or sacrifice accuracy. We describe Flock, a network fault localization algorithm and system that achieves both high accuracy and speed at datacenter scale. Flock uses a probabilistic graphical model (PGM) to achieve high accuracy, coupled with new techniques to dramatically accelerate inference in discrete-valued Bayesian PGMs. Large-scale simulations and experiments in a hardware testbed show Flock speeds up inference by >10000x compared to past PGM methods, and improves accuracy over the best previous datacenter fault localization approaches, reducing inference error by 1.19-11x on the same input telemetry, and by 1.2-55x after incorporating passive telemetry. We also prove Flock's inference is optimal in restricted settings.
Enterprise networks today have highly diverse correctness requirements and relatively common performance objectives. As a result, preferred abstractions for enterprise networks are those which allow matching correctness specification, while transparently managing performance. Existing SDN network management architectures, however, bundle correctness and performance as a single abstraction. We argue that this creates an SDN ecosystem that is unnecessarily hard to build, maintain and evolve. We advocate a separation of the diverse correctness abstractions from generic performance optimization, to enable easier evolution of SDN controllers and platforms. We propose Oreo, a first step towards a common and relatively transparent performance optimization layer for SDN. Oreo performs the optimization by first building a model that describes every flow in the network, and then performing network-wide, multi-objective optimization based on this model without disrupting higher level correctness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.