2015 IEEE International Conference on Cluster Computing 2015
DOI: 10.1109/cluster.2015.135
|View full text |Cite
|
Sign up to set email alerts
|

Fault-Tolerant Routing for Exascale Supercomputer: The BXI Routing Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…This contrasts with common techniques based on simulation or experimentation which do not link observations of contention with a corresponding explanation. This simple technique of estimation of contention is new; it is also used in concert with the architecture described by Vigneras & Quintin [8] with the goal of automating computation of that metric for potential integration into the fabric management's decision making. Analysis in terms of this metric is sufficient to prove and explain drawbacks and benefits of algorithms, but a simulation-based analysis would complement this work to give tangible results for real-life applications.…”
Section: A Static Congestion Metricmentioning
confidence: 99%
See 1 more Smart Citation
“…This contrasts with common techniques based on simulation or experimentation which do not link observations of contention with a corresponding explanation. This simple technique of estimation of contention is new; it is also used in concert with the architecture described by Vigneras & Quintin [8] with the goal of automating computation of that metric for potential integration into the fabric management's decision making. Analysis in terms of this metric is sufficient to prove and explain drawbacks and benefits of algorithms, but a simulation-based analysis would complement this work to give tangible results for real-life applications.…”
Section: A Static Congestion Metricmentioning
confidence: 99%
“…No perfect agnostic algorithm exists, and designing a good routing algorithm usually requires paying attention to the topology and communication patterns which will take place. As detailed by Vigneras & Quintin [8], we can consider that the topology of an HPC cluster never changes, so algorithms are usually designed for a given topology class (i.e. topology-aware algorithms).…”
Section: Introductionmentioning
confidence: 99%
“…PQFT [4] is similar, though it requires a complete list of faults. The combination of Dmodk + Ftrnd diff [11] available in BXI FM [12] is applied in an offline/online manner (with an iterative list of topology changes and an up-to-date view of the topology), the goal being fast reaction to faults with minimal routing changes. Fabriscale [13] also provides fast centralized re-routing of fat-trees, by precomputing alternative routes.…”
Section: Introductionmentioning
confidence: 99%