Hybrid and direct topologies are cost-efficient and scalable options to interconnect thousands of end nodes in high-performance computing (HPC) systems. They offer a rich path diversity, high bisection bandwidth, and a reduced diameter guaranteeing low latency. In these topologies, efficient deterministic routing algorithms can be used to balance smartly the traffic flows among the available routes. Unfortunately, congestion leads these networks to saturation, where the HoL blocking effect degrades their performance dramatically. Among the proposed solutions to deal with HoL blocking, the routing algorithms selecting alternative routes, such as adaptive and oblivious, can mitigate the congestion effects. Other techniques use queues to separate congested flows from non-congested ones, thus reducing the HoL blocking. In this article, we propose a new approach that reduces HoL blocking in hybrid and direct topologies using source-adaptive and oblivious routing. This approach also guarantees deadlock-freedom as it uses virtual networks to break potential cycles generated by the routing policy in the topology. Specifically, we propose two techniques, called Source-Adaptive Solution for Head-of-Line Blocking Avoidance (SASHA) and Oblivious Solution for Head-of-Line Blocking Avoidance (OSHA). Experiment results, carried out through simulations under different traffic scenarios, show that SASHA and OSHA can significantly reduce the HoL blocking.
Interconnection networks are the communication backbone of modern high-performance computing systems and an optimised interconnection network is crucial for the performance and utilisation of the system as a whole. One element of the interconnection network is the routing algorithm, which directly influences how we are able to utilise the physical network topology. InfiniBand is one of the most common network architectures used in high-performance computing and traditionally it only supported static routing. For multi-path networks such as Fat-trees, static routing is inefficient because it cannot balance traffic in real-time nor utilise multiple paths efficiently under adversarial traffic. This again potentially leads to unnecessary contention and an underutilised network, which has led to numerous proposals on how to avoid this by using adaptive routing. Adaptive routing has recently been introduced in InfiniBand and in this paper we evaluate to what extent the expected benefits of adaptive routing is true for InfiniBand. Through a set of experiments on HDR InfiniBand equipment we describe the basic behaviour of adaptive routing in InfiniBand, its benefits in Fat tree topologies and the unfortunate side effects related to unfairness that adaptive routing in general might introduce, including such phenomena as the reverse parking lot problem and congestion spreading.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.