2018
DOI: 10.1109/tpds.2017.2779872
|View full text |Cite
|
Sign up to set email alerts
|

Non-Asymptotic Delay Bounds for Multi-Server Systems with Synchronization Constraints

Abstract: Multi-server systems have received increasing attention with important implementations such as Google MapReduce, Hadoop, and Spark. Common to these systems are a fork operation, where jobs are first divided into tasks that are processed in parallel, and a later join operation, where completed tasks wait until the results of all tasks of a job can be combined and the job leaves the system. The synchronization constraint of the join operation makes the analysis of fork-join systems challenging and few explicit r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(23 citation statements)
references
References 46 publications
(117 reference statements)
0
22
0
Order By: Relevance
“…Compared to the fork-join model, where tasks are bound to particular servers and a large task can block tasks of subsequent jobs, in the singlequeue fork-join model small jobs can overtake jobs with large straggler tasks. Mean sojourn times for such systems are derived in [25], and bounds on the sojourn time are derived using network calculus in [17].…”
Section: Systems Models and Stability Regionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared to the fork-join model, where tasks are bound to particular servers and a large task can block tasks of subsequent jobs, in the singlequeue fork-join model small jobs can overtake jobs with large straggler tasks. Mean sojourn times for such systems are derived in [25], and bounds on the sojourn time are derived using network calculus in [17].…”
Section: Systems Models and Stability Regionsmentioning
confidence: 99%
“…3 shows how job sojourn time scales with the number of servers for these three models in the case with k = l and exponential inter-arrival and task service times. The plot shows performance bounds derived using network calculus in [16] and [17], simulation results, and experimental results from an Apache Spark cluster, and demonstrates that a Spark system may behave like any of these three parallel models, depending on how it is configured and how the driver program behaves. For comparison, the plot includes the equivalent sojourn time statistics for the ideal job partition.…”
Section: Systems Models and Stability Regionsmentioning
confidence: 99%
“…Note that, ours is a more general model that not only consider transmission failures (erasures), but also the errors in transmission due to imperfect communication links or fading channels. The system models used in different applications such as distributed max plus systems [22], [23], distributed detection and target tracking [24], distributed sensor fusion [25] and multi-agent control systems [21] literature resembles our model.…”
Section: System Modelmentioning
confidence: 99%
“…Transient and steady-state solutions of the FJ queue in terms of virtual waiting times are obtained in [21]. Network calculus techniques have also been used to derive bounds [22], [23]. Results for FJ systems with two servers having exponential service times under Poissonian job arrivals are shown in [24].…”
Section: Related Workmentioning
confidence: 99%