The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2010
DOI: 10.5120/636-891
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing

Abstract: Grid is a form distributed computing mainly to virtualilze and utilize geographically distributed idle resources. A grid is a distributed computational and storage environment often composed of heterogeneous autonomously managed subsystems. As a result varying resource availability becomes common place, often resulting in loss and delay of executing jobs. To ensure good performance fault tolerance should be taken into account. Here we address the fault tolerance in terms of resource failure. Commonly utilized … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…In their work [10], Theresa et al propose two dynamic checkpoint strategies: Last Failure time-based Checkpoint Adaptation (LFCA) and Mean Failure time-based Checkpoint Adaptation (MFCA), which takes into account the stability of the system and the probability of failure concerning individual resources.…”
Section: State Of the Artmentioning
confidence: 99%
“…In their work [10], Theresa et al propose two dynamic checkpoint strategies: Last Failure time-based Checkpoint Adaptation (LFCA) and Mean Failure time-based Checkpoint Adaptation (MFCA), which takes into account the stability of the system and the probability of failure concerning individual resources.…”
Section: State Of the Artmentioning
confidence: 99%
“…In that work, by means of simulation, they propose a similar approach to ours, trying to both increase the utilization and meet job deadlines, but without providing any differentiation of QoS levels. These rescheduling techniques can also be used to try to provide fault tolerance performance, such as the work presented in [52], where this is provided by changing the frequency of the checkpointing process based on current status and history of failure information of the resource and by rescheduling the jobs when those failures happen. However, they do not migrate jobs to increase utilization or QoS differentiation amongst users.…”
Section: Qos Mechanismsmentioning
confidence: 99%
“…The authors of [5,6,7,8] make some assumptions or gather statistics about failure distribution of individual resources or resource systems and based on these values and calculations adjust fault tolerant mechanisms. For example Theresa et al propose in their work [8] two dynamic checkpoint strategies: Last Failure time based Checkpoint Adaptation (LFCA) and Mean Failure time based Checkpoint Adaptation (MFCA) which takes into account the stability of the system and the probability of failure concerning the individual resources. Young in [5] has already in 1974 defined his formula for the optimum periodic checkpoint interval which is based on the checkpointing cost and the mean time between failures (MTBF) with the assumption that failure intervals follow an exponential distribution.…”
Section: A Fault Tolerancementioning
confidence: 99%