Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems 2021
DOI: 10.1145/3410220.3453924
|View full text |Cite
|
Sign up to set email alerts
|

Zero Queueing for Multi-Server Jobs

Abstract: Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multiserver job queueing systems; even understanding t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 11 publications
0
14
0
Order By: Relevance
“…ISSP, like the state-space collapse in the heavy-traffic analysis, is a general technique that may be used to study other complex stochastic systems, e.g. large-system insensitivity of load balancing algorithms for other models like those studied in [29,39,40,37,38].…”
Section: Main Contributionsmentioning
confidence: 99%
See 2 more Smart Citations
“…ISSP, like the state-space collapse in the heavy-traffic analysis, is a general technique that may be used to study other complex stochastic systems, e.g. large-system insensitivity of load balancing algorithms for other models like those studied in [29,39,40,37,38].…”
Section: Main Contributionsmentioning
confidence: 99%
“…Significant processes have been made over the past few years on understanding achieving asymptotic zero-waiting (as the system size approaches infinity) in a large-scale data center with distributed queues, including the classic supermarket model [14,8,32,17,3,4,30,24,25,23,22,45,9], models with data locality [40,31] and models where each job consists of parallel tasks [39,37,19], etc.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A recent advance in understanding the delay of multiserver jobs is a characterization of the queueing probability in a large system by Wang et al [42], where the queueing probability is the probability that an arriving job has to queue rather than entering service immediately. Specifically, Wang et al [42] consider a multiserver job system with 𝑛 servers, and study the asymptotic scaling regime where 𝑛 becomes large. The scaling regime also allows server needs and arrival rates of jobs to scale with 𝑛 to capture the trend that the server needs in practice can be large and heterogeneous.…”
Section: Introductionmentioning
confidence: 99%
“…Exact Markovchain methods suffer from the curse of dimensionality as the system grows large [18,29,44]. Asymptotic methods such as fluid and diffusion limits are often only applicable at high load, many servers, or both [45,47,50,51]. Lindley-type recursions can only be applied when the job completion process has a specific structure, renewing after every arrival [22,25].…”
Section: Introductionmentioning
confidence: 99%