1993
DOI: 10.21236/ada266594
|View full text |Cite
|
Sign up to set email alerts
|

Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

1994
1994
2016
2016

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(29 citation statements)
references
References 18 publications
(7 reference statements)
0
29
0
Order By: Relevance
“…These approaches typically launch daemons on every node that form and maintain communication groups that allow tracking and managing recovery by maintaining the configuration of the communication system. The failure of any given node in the group is handled by restarting the failed process on a different node, by restructuring the computation, or through transparent migration to another node [2] [13] [51].…”
Section: Operating System and Runtime-based Solutionsmentioning
confidence: 99%
“…These approaches typically launch daemons on every node that form and maintain communication groups that allow tracking and managing recovery by maintaining the configuration of the communication system. The failure of any given node in the group is handled by restarting the failed process on a different node, by restructuring the computation, or through transparent migration to another node [2] [13] [51].…”
Section: Operating System and Runtime-based Solutionsmentioning
confidence: 99%
“…This can be done by a consistent checkpointing scheme that saves a global checkpoint to a central file server at very coarse intervals (for example, once every hour or day). Such checkpointing schemes are straightforward and have been discussed and implemented elsewhere [11,17,18,26,29,36,38,43].…”
Section: A Model For Scientific Programs That Live On a Nowmentioning
confidence: 99%
“…In other words, rather than implement checkpointing transparently as in MIST [11], Fail-Safe PVM [29], or CoCheck [43], we hardwire it into the program. This is beneficial for several reasons.…”
Section: The Checkpointing Algorithmmentioning
confidence: 99%
“…fault-tolerant networks and system reconfiguration after a fault. There has been some though, for example, FT-Linda [4], PLinda [15], Orca [16], Calypso [5], and Fail-safe PVM [17]. These systems use a combination of well known mechanisms such as replication, transactions, message logging, or checkpoints and rollbacks to provide fault-tolerance.…”
Section: Related Workmentioning
confidence: 99%