The jellyfish topology where switches are connected using a random graph has recently been proposed for large scale data-center networks. It has been shown to offer higher bisection bandwidth and better permutation throughput than the corresponding fat-tree topology with a similar cost. In this work, we propose a new routing scheme for jellyfish that out-performs existing schemes by more effectively exploiting the path diversity, and comprehensively compare the performance of jellyfish and fat-tree topologies with HPC workloads. The results indicate that both jellyfish and fattree topologies offer comparable high performance for HPC workloads on systems that can be realized by 3-level fat-trees using the current technology and the corresponding jellyfish topologies with similar costs. Fat-trees are more effective for smaller systems while jellyfish is more scalable.
We consider a general form of routing, called limited multi-path routing, on extended generalized fat-trees where the number of paths between each pair of processing nodes is a parameter. Existing single-path routing and multi-path routing for such topologies are special cases of limited multipath routing. We propose path calculation heuristics, including shift-1, disjoint, and random for limited multi-path routing on extended generalized fat-trees. All of these heuristics are based on existing single-path routing schemes, work for limited multi-path routing with any given number of paths between processing nodes, gracefully increase routing performance as the number increases, and reach optimal when all shortest paths between processing nodes are allowed for carrying traffics. Flow-level and flit-level simulation experiments are carried out to study the performance. The results show that the disjoint heuristic significantly out-performs the other methods.
Fat-tree based system area networks have been widely adopted in high performance computing clusters. In such systems, the routing is often deterministic and the traffic demand is usually uncertain and changing. In this paper, we study routing performance on fat-tree based system area networks with deterministic routing under the assumption that the traffic demand is uncertain. The performance of a routing algorithm under uncertain traffic demands is characterized by the oblivious performance ratio that bounds the relative performance of the routing algorithm and the optimal routing algorithm for any given traffic demand. We consider both single path routing where the traffic between each source-destination pair follows one path, and multipath routing where multiple paths can be used for the traffic between a source-destination pair. We derive lower bounds of the oblivious performance ratio of any single path routing scheme for fat-tree topologies and develop single path oblivious routing schemes that achieve the optimal oblivious performance ratio for commonly used fat-tree topologies. These oblivious routing schemes provide the best performance guarantees among all single path routing algorithms under uncertain traffic demands. For multi-path routing, we show that it is possible to obtain a scheme that is optimal for any traffic demand (an oblivious performance ratio of 1) on the fat-tree topology. These results quantitatively demonstrate that single path routing cannot guarantee high routing performance while multi-path routing is very effective in balancing network loads on the fat-tree topology.
Abstract-To realize a path in an InfiniBand network, an address, known as Local IDentifier (LID) in the InfiniBand specification, must be assigned to the destination of the path and used in the forwarding tables of intermediate switches to direct the traffic following the path. Hence, routing in InfiniBand has two components: (1) computing all paths, and (2) assigning LIDs to destinations and using them in intermediate switches to realize the paths. We refer to the task of computing paths as path computation and the task of assigning LIDs as LID assignment. This paper focuses on the LID assignment component, whose major issue is to minimize the number of LIDs required to support a given set of paths. We prove that the problem of realizing a given set of paths with a minimum number of LIDs is NPcomplete, develop an integer linear programming formulation for this problem, design a number of heuristics that are effective and efficient in practical cases, and evaluate the performance of the heuristics through simulation. The experimental results indicate that the performance of our best performing heuristic is very close to optimal. We further demonstrate that by separating path computation from LID assignment and using the schemes that are known to achieve good performance for path computation and LID assignment separately, more effective routing schemes than existing ones can be developed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.