2012
DOI: 10.1109/tpds.2011.285
|View full text |Cite
|
Sign up to set email alerts
|

Performance and Reliability of Non-Markovian Heterogeneous Distributed Computing Systems

Abstract: Abstract-Average service time, quality-of-service (QoS), and service reliability associated with heterogeneous parallel and distributed computing systems (DCSs) are analytically characterized in a realistic setting for which tangible, stochastic communication delays are present with nonexponential distributions. The departure from the traditionally assumed exponential distributions for event times, such as task-execution times, communication arrival times and load-transfer delays, gives rise to a non-Markovian… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 34 publications
(92 reference statements)
0
6
0
Order By: Relevance
“…Although reliability in distributed computing has been studied before [175], standard fault tolerance and reliability approaches cannot be directly applied in Cloud computing systems. The scale and expected reliability of Cloud computing are increasingly important but hard to analyse due to the range of inter-related characteristics, e.g.…”
Section: Reliabilitymentioning
confidence: 99%
“…Although reliability in distributed computing has been studied before [175], standard fault tolerance and reliability approaches cannot be directly applied in Cloud computing systems. The scale and expected reliability of Cloud computing are increasingly important but hard to analyse due to the range of inter-related characteristics, e.g.…”
Section: Reliabilitymentioning
confidence: 99%
“…These algorithms are all static on the assumption that job arrival and execution conform to queueing theory. Actually, they may fail under non‐Markovian environment . The work implements a dynamic scheduling for coordination.…”
Section: Related Workmentioning
confidence: 99%
“…Actually, they may fail under non-Markovian environment. 31 The work 32 implements a dynamic scheduling for coordination. But it focuses on making up for inaccurate run-time estimates to improve the response time.…”
Section: Related Workmentioning
confidence: 99%
“…In [9], [28] we developed a flexible class of DTR policies. Each policy in the class estimates, at t ¼ t b , the amount of load imbalance, L ex j ðt b Þ, that each server has with respect to the estimated total system load,M i ðt b Þ.…”
Section: Correlated-failure-aware Distributed Task Reallocation Policymentioning
confidence: 99%
“…This paper has two contributions: 1) modeling the service reliability of applications executed on DCSs in the presence of correlated component failures by means of a hybrid analytical and MC-based approach, and 2) optimizing the service reliability by means of DTR policies. The service reliability is modeled by extending our analytical non-Markovian model in [9] to include specific group failures of CEs at each failure event. This extension enables us to calculate the reliability conditional on the occurrence of a specific realization of correlated CE failures.…”
mentioning
confidence: 99%