2002
DOI: 10.1145/511399.511362
|View full text |Cite
|
Sign up to set email alerts
|

Improving cluster availability using workstation validation

Abstract: We demonstrate a framework for improving the availability of cluster based Internet services. Our approach models Internet services as a collection of interconnected components, each possessing well defined interfaces and failure semantics. Such a decomposition allows designers to engineer high availability based on an understanding of the interconnections and isolated fault behavior of each component, as opposed to ad-hoc methods. In this work, we focus on using the entire commodity workstation as a component… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
45
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 39 publications
(48 citation statements)
references
References 9 publications
3
45
0
Order By: Relevance
“…Although our results for two particular production workloads show that Daly's approximation leads to reasonably good results in the no-replication case (see the g = 1 curves in Fig. 5), there is evidence that, in general, failure distributions are well approximated by Weibull distributions [17][18][19]21,20], while not at all by exponential distributions. Most recently, in [21], the authors show that failures observed on a production cluster, over a cumulative 42-month time period, are modeled well by a Weibull distribution with shape parameter k < 0.5.…”
Section: When Is Process Replication Beneficial?mentioning
confidence: 72%
See 1 more Smart Citation
“…Although our results for two particular production workloads show that Daly's approximation leads to reasonably good results in the no-replication case (see the g = 1 curves in Fig. 5), there is evidence that, in general, failure distributions are well approximated by Weibull distributions [17][18][19]21,20], while not at all by exponential distributions. Most recently, in [21], the authors show that failures observed on a production cluster, over a cumulative 42-month time period, are modeled well by a Weibull distribution with shape parameter k < 0.5.…”
Section: When Is Process Replication Beneficial?mentioning
confidence: 72%
“…Exponential failures are often assumed due to their convenient memoryless property, i.e., the fact that the time to the next failure does not depend on when the last failure has occurred [16]. But the non-memoryless Weibull distribution is recognized as a more realistic model [17][18][19][20][21]. The work in this paper relates to CRR in the sense that we study a replication mechanism that is complementary to checkpointing.…”
Section: Related Workmentioning
confidence: 99%
“…More attention has been paid to modeling the characteristics of resource availability and many researches [5,[8][9][10][21][22][23] show that strong temporal and spatial correlations of failure events and resource failures follow the Weibull, Hyperexponential and Pareto distributions with different parameters rather than a Poisson distribution. Oliner et al [4] and Zhang et al [5] evaluate three application-level periodic checkpoint heuristics, checkpointing all jobs, long jobs and big jobs, in large-scale cluster system using temporal or spatial information of resource availability.…”
Section: Related Workmentioning
confidence: 99%
“…For example, the job whose running time exceeds one hour(it can be pre-specified) and whose running success ratio is less than 90%(it can be pre-specified) can be defined as the long job. RT ij denotes the running time of job i on resource node j and can be obtained according to (9).…”
Section: Checkpoint Job Choosing Algorithmmentioning
confidence: 99%
See 1 more Smart Citation