Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in TCP's demands on the limited buffer space available in data center switches. For example, bandwidth hungry "background" flows build up queues at the switches, and thus impact the performance of latency sensitive "foreground" traffic.To address these problems, we propose DCTCP, a TCP-like protocol for data center networks. DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. We evaluate DCTCP at 1 and 10Gbps speeds using commodity, shallow buffered switches. We find DCTCP delivers the same or better throughput than TCP, while using 90% less buffer space. Unlike TCP, DCTCP also provides high burst tolerance and low latency for short flows. In handling workloads derived from operational measurements, we found DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic. Further, a 10X increase in foreground traffic does not cause any timeouts, thus largely eliminating incast problems.
Many applications require fast data transfer over high speed and long distance networks. However, standard TCP fails to fully utilize the network capacity due to the limitation in its conservative congestion control (CC) algorithm. Some works have been proposed to improve the connection's throughput by adopting more aggressive loss-based CC algorithms. These algorithms, although can effectively improve the link utilization, have the weakness of poor RTT fairness. Further, they may severely decrease the performance of regular TCP flows that traverse the same network path. On the other hand, pure delay-based approaches that improve the throughput in high-speed networks may not work well when the traffic is mixed with both delaybased and greedy loss-based flows. In this paper, we propose a novel Compound TCP (CTCP) approach, which is a synergy of delay-based and loss-based approach. Specifically, we add a scalable delay-based component into the standard TCP Reno congestion avoidance algorithm (a.k.a., the loss-based component). The sending rate of CTCP is controlled by both components. This new delay-based component can rapidly increase sending rate when network path is under utilized, but gracefully retreat in a busy network when bottleneck queue is built. Augmented with this delay-based component, CTCP provides very good bandwidth scalability with improved RTT fairness, and at the same time achieves good TCP-fairness, irrelevant to the windows size. We developed an analytical model of CTCP and implemented it on the Windows operating system. Our analysis and experiment results verify the properties of CTCP.Index Terms-TCP performance, delay-based congestion control, high speed network I. INTRODUCTION Moving bulk data quickly over high-speed data network is a requirement for many applications. For example, the physicists at CERN LHC conduct physics experiments that generate gigabytes of data per second, which are required to be shared among other scientists around the world [2]. Currently, most applications use the Transmission Control Protocol (TCP) to transmit data over the Internet. TCP provides reliable data transmission with embedded congestion control algorithm [1] which effectively removes congestion collapses in the Internet by adjusting the sending rate according to the available bandwidth of the network. However, although TCP achieves remarkable success (maximizing the utilization of the link and fairly sharing bandwidth between competing flows) in today's Internet environment, it has been reported that TCP substantially underutilizes network bandwidth over high-speed and long distance networks [4].In high-speed and long distance networks, TCP requires a
Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in TCP's demands on the limited buffer space available in data center switches. For example, bandwidth hungry "background" flows build up queues at the switches, and thus impact the performance of latency sensitive "foreground" traffic. To address these problems, we propose DCTCP, a TCP-like protocol for data center networks. DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. We evaluate DCTCP at 1 and 10Gbps speeds using commodity, shallow buffered switches. We find DCTCP delivers the same or better throughput than TCP, while using 90% less buffer space. Unlike TCP, DCTCP also provides high burst tolerance and low latency for short flows. In handling workloads derived from operational measurements, we found DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic. Further, a 10X increase in foreground traffic does not cause any timeouts, thus largely eliminating incast problems.
Several methods have been developed for joint working and spare capacity planning in survivable wavelength-division-multiplexing (WDM) networks. These methods have considered a static traffic demand and optimized the network cost assuming various cost models and survivability paradigms. Our interest primarily lies in network operation under dynamic traffic. We formulate various operational phases in survivable WDM networks as a single integer linear programming (ILP) optimization problem. This common framework avoids service disruption to the existing connections. However, the complexity of the optimization problem makes the formulation applicable only for network provisioning and offline reconfiguration. The direct use of this method for online reconfiguration remains limited to small networks with few tens of wavelengths.Our goal in this paper is to develop an algorithm for fast online reconfiguration. We propose a heuristic algorithm based on LP relaxation technique to solve this problem. Since the ILP variables are relaxed, we provide a way to derive a feasible solution from the relaxed problem. The algorithm consists of two steps. In the first step, the network topology is processed based on the demand set to be provisioned. This preprocessing step is done to ensure that the LP yields a feasible solution. The preprocessing step in our algorithm is based on: a) the assumption that in a network, two routes between any given node pair are sufficient to provide effective fault tolerance and b) an observation on the working of the ILP for such networks. In the second step, using the processed topology as input, we formulate and solve the LP problem. Interestingly, the LP relaxation heuristic yielded a feasible solution to the ILP in all our experiments. We provide insights into why the LP formulation yields a feasible solution to the ILP. We demonstrate the use of our algorithm on practical size backbone networks with hundreds of wavelengths per link. The results indicate that the run time of our heuristic algorithm is fast enough (in order of seconds) to be used for online reconfiguration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.