2019
DOI: 10.1016/j.simpat.2018.09.012
|View full text |Cite
|
Sign up to set email alerts
|

Fault tolerant adaptive parallel and distributed simulation through functional replication

Abstract: This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…To detect the fault, the cloud or Fog statuses are regularly monitored and various responses, such as replication, checkpoint, and resubmission, are recorded [8]. Gabriele et al [9] suggested the use of a Fault-Tolerance Generic Adaptive Interaction Architecture (FT-GAIA) software-based FT method for parallel and distributed environments. Mainly, this method deals with Byzantine faults and crash errors with the help of server restoration in the cloud layer, and this is achieved with the help of a replication mechanism.…”
Section: Fault Tolerance and Node Discoverymentioning
confidence: 99%
See 1 more Smart Citation
“…To detect the fault, the cloud or Fog statuses are regularly monitored and various responses, such as replication, checkpoint, and resubmission, are recorded [8]. Gabriele et al [9] suggested the use of a Fault-Tolerance Generic Adaptive Interaction Architecture (FT-GAIA) software-based FT method for parallel and distributed environments. Mainly, this method deals with Byzantine faults and crash errors with the help of server restoration in the cloud layer, and this is achieved with the help of a replication mechanism.…”
Section: Fault Tolerance and Node Discoverymentioning
confidence: 99%
“…E exe = T exe × p na (9) where p k and p na denote the power consumption of the IoT device and f na , respectively, during task execution.…”
Section: Energy and Priority Modelmentioning
confidence: 99%