Abstract-While Software Defined Networking (SDN) has received a considerable amount of attention, improving the scalability of an SDN controller has always been a major concern. One of the main reasons why the controller suffers from this scalability problem is that it is fairly often overwhelmed by a large number of flow setup requests from the SDN switches to the controller. These requests cause decrease in the number of switches that the controller can deal with. Since these requests are usually generated when flows arrive at the switch and their corresponding entries do not exist in the flow table due to eviction, minimizing the number of evictions also reduces the number of requests to the controller. This paper addresses the scalability problem and proposes the algorithm improving the scalability of the SDN controller by dynamically controlling the timeout value of each flow without modifying the switches. In the proposed approach, the controller collects various traffic parameters from the switches and predicts the inter-arrival times of packets in a flow. Based on the information, it dynamically adjusts the timeout value of each flow to reserve spaces in the flow table for newly arrived flows in advance. As a result, this avoids the evictions and reduces the number of flow setup requests to the controller. The benchmarking results show that the proposed algorithm reduces the number of packets to the controller by 9.9 %.
This paper proposes Hermes, a container-based preemptive GPU scheduling framework for accelerating hyper-parameter optimization in deep learning (DL) clusters. Hermes accelerates hyper-parameter optimization by time-sharing between DL jobs and prioritizing jobs with more promising hyper-parameter combinations. Hermes’s scheduling policy is grounded on the observation that good hyper-parameter combinations converge quickly in the early phases of training. By giving higher priority to fast-converging containers, Hermes’s GPU preemption mechanism can accelerate training. This enables users to find optimal hyper-parameters faster without losing the progress of a container. We have implemented Hermes over Kubernetes and compared its performance against existing scheduling frameworks. Experiments show that Hermes reduces the time for hyper-parameter optimization up to 4.04 times against previously proposed scheduling policies such as FIFO, round-robin (RR), and SLAQ, with minimal time-sharing overhead.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.