Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces 2009
DOI: 10.1109/iti.2009.5196152
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive checkpointing in dynamic grids for uncertain job durations

Abstract: Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this paper an adaptive algorithm, named MeanFailureCP+, is introduced that deals with checkpointing of grid applications with execution times that are unknown a priori. The algorithm modifies its parameters, based on dynamic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2011
2011
2015
2015

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 9 publications
0
2
0
1
Order By: Relevance
“…[Zhang and Chakrabarty 2003] adaptam a frequência de checkpointing em sistemas embarcados de tempo real brando, minimizando o consumo de energia enquanto tolera um número fixo de faltas para uma dada tarefa. [Chtepen et al 2009] propõem o uso de checkpointing adaptativo em grades computacionais, ajustando o intervalo entre checkpoints de acordo com o tempo estimado de execuc ¸ão do job e da frequência de falhas de recursos; o objetivo é balancear o overhead de checkpointing e o custo de reiniciar jobs interrompidos por falhas. Em comum, esses trabalhos têm como foco controlar o tempo de execuc ¸ão limitando o tempo de recuperac ¸ão após uma falha, sem se preocupar diretamente com a responsividade.…”
Section: Trabalhos Relacionadosunclassified
“…[Zhang and Chakrabarty 2003] adaptam a frequência de checkpointing em sistemas embarcados de tempo real brando, minimizando o consumo de energia enquanto tolera um número fixo de faltas para uma dada tarefa. [Chtepen et al 2009] propõem o uso de checkpointing adaptativo em grades computacionais, ajustando o intervalo entre checkpoints de acordo com o tempo estimado de execuc ¸ão do job e da frequência de falhas de recursos; o objetivo é balancear o overhead de checkpointing e o custo de reiniciar jobs interrompidos por falhas. Em comum, esses trabalhos têm como foco controlar o tempo de execuc ¸ão limitando o tempo de recuperac ¸ão após uma falha, sem se preocupar diretamente com a responsividade.…”
Section: Trabalhos Relacionadosunclassified
“…Although the good results that this approaches provides, they stay limited to a static assignment. Various research is examined in this issue (Falzon & Li, 2010;Nakechbandi, Colin, & Gashumba, 2007;Chen, Zhang, & Hao 2008).Large and non dedicated computing platforms as grids may require dynamic task assignment methods to adapt to the run-time changes such as increases in the workload or resources, processor failures, and link failures (Uçar, Aykanat, Kaya, & Ikinci, 2006;He & Zhao, 2008;Chtepen, Dhoedt, Turck, & Demeester, 2008).…”
Section: Related Workmentioning
confidence: 99%
“…This algorithm is designed to modify a job checkpointing interval as a function of mean failure frequency of resources where the job is being executed, and the total job execution time. In [7], they developed the MeanFailureCP+ algorithm which is a modification of the MeanFailureCP that deals with checkpointing of grid applications with execution times that are unknown a priori. Review of literature reveals that a large number of research efforts have already been devoted to tolerate faults in computational grids.…”
Section: Introductionmentioning
confidence: 99%