2006
DOI: 10.1109/tc.2006.46
|View full text |Cite
|
Sign up to set email alerts
|

A routing methodology for achieving fault tolerance in direct networks

Abstract: Abstract-Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
83
0

Year Published

2008
2008
2018
2018

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 78 publications
(83 citation statements)
references
References 44 publications
0
83
0
Order By: Relevance
“…Some of these definitions are reiterated from previous works [6,7,[10][11][12], for the sake of completeness.…”
Section: Terminologiesmentioning
confidence: 99%
See 4 more Smart Citations
“…Some of these definitions are reiterated from previous works [6,7,[10][11][12], for the sake of completeness.…”
Section: Terminologiesmentioning
confidence: 99%
“…The torus has been popular interconnection network topology in contemporary systems [6] due to their desirable properties, such as ease of implementation and ability to exploit communication locality to reduce message latency [12]. In addition, torus is a regular (i.e., all nodes have the same degree) and edge-symmetric network, which improves load balancing across the channels [13].…”
Section: The Torus Topologymentioning
confidence: 99%
See 3 more Smart Citations