Many optical circuit switched data center networks (DCN) have been proposed in the last decade to attain higher capacity and topology reconfigurability, though commercial adoption of these architectures have been minimal. One major challenge these architectures face is the difficulty of handling uncertain traffic demands using commercial optical circuit switches (OCS) with high switching latency. Prior works have generally focused on developing fast-switching OCS prototypes to quickly react to traffic variations through frequent reconfigurations. This approach, however, adds tremendous complexity overhead to the control plane, and raises the barrier for commercial adoption of optical circuit switched data center networks. We propose COUDER, a robust topology and routing optimization framework for reconfigurable optical circuit switched data centers. COUDER co-optimizes topology and routing based on a convex set of traffic matrices, and offers strict throughput guarantees for any future traffic matrices bounded by the convex set. For the bursty traffic demands that are unbounded by the convex set, we employ a desensitization technique to reduce performance hit. This enables COUDER to generate topology and routing solutions capable of handling unexpected traffic changes without relying on frequent topology reconfigurations. Our extensive evaluations based on Facebook's production DCN traces show that, even with daily reconfigurations which could be realized by current commercial MEMS-based OCSs from Calient Technologies, COUDER achieves about 20% lower max link utilization, and about 32% lower average hop count compared to cost-equivalent static topologies. Our work shows that adoption of reconfigurable topologies in commercial DCNs is feasible even without fast OCSs.
Despite the bandwidth scaling limit of electrical switching and the high cost of building Clos data center networks (DCNs), the adoption of optical DCNs is still limited. There are two reasons. First, existing optical DCN designs usually face high deployment complexity. Second, these designs are not full-optical and the performance benefit over the non-blocking Clos DCN is not clear. After exploring the design tradeoffs of the existing optical DCN designs, we propose TROD (Threshold Routing based Optical Datacenter), a low-complexity optical DCN with superior performance than other optical DCNs. There are two novel designs in TROD that contribute to its success. First, TROD performs robust topology optimization based on the recurring traffic patterns and thus does not need to react to every traffic change, which lowers deployment and management complexity. Second, TROD introduces tVLB (threshold-based Valiant Load Balance), which can avoid network congestion as much as possible even under unexpected traffic bursts. We conduct simulation based on both Facebook's real DCN traces and our synthesized highly bursty DCN traces. TROD reduces flow completion time (FCT) by about 1.15-2.16× compared to Google's Jupiter DCN, at least 2× compared to other optical DCN designs, and about 2.4-3.2× compared to expander graph DCN. Compared with the non-blocking Clos, TROD reduces the hop count of the majority packets by one, and could even outperform the non-blocking Clos with proper bandwidth over-provision at the optical layer. Note that TROD can be built with commercially available hardware and does not require host modifications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.