Zero Queueing for Multi-Server Jobs

Wang, Weina; Xie, Qiaomin

doi:10.1145/3410220.3453924

Cited by 13 publications

(14 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ISSP, like the state-space collapse in the heavy-traffic analysis, is a general technique that may be used to study other complex stochastic systems, e.g. large-system insensitivity of load balancing algorithms for other models like those studied in [29,39,40,37,38].…”

Section: Main Contributionsmentioning

confidence: 99%

“…Significant processes have been made over the past few years on understanding achieving asymptotic zero-waiting (as the system size approaches infinity) in a large-scale data center with distributed queues, including the classic supermarket model [14,8,32,17,3,4,30,24,25,23,22,45,9], models with data locality [40,31] and models where each job consists of parallel tasks [39,37,19], etc.…”

Section: Introductionmentioning

confidence: 99%

“…However, almost all these results assume exponential service time distributions. While each of these results [14,8,32,17,29,30,24,25,23,22,40,31,39,37,19,45,9] provided important insights of achieving zero-waiting in a practical system, theoretically, it is not clear whether these principles hold for general service times. This is a very important question to answer because it is well-known that service time distributions in real-world systems are not exponential.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms

Liu¹,

Gong²,

Lü³

2022

Preprint

View full text Add to dashboard Cite

This paper studies the sensitivity (or insensitivity) of a class of load balancing algorithms that achieve asymptotic zero-waiting in the sub-Halfin-Whitt regime [24], named LB-zero. Most existing results on zero-waiting load balancing algorithms assume the service time distribution is exponential. This paper establishes the large-system insensitivity of LB-zero for jobs whose service time follows a Coxian distribution with a finite number of phases. This result suggests that LB-zero achieves asymptotic zero-waiting for a large class of service time distributions, which is confirmed in our simulations. To prove this result, this paper develops a new technique, called "Iterative State-Space Peeling" (or ISSP for short). ISSP first identifies an iterative relation between the upper and lower bounds on the queue states and then proves that the system lives near the fixed point of the iterative bounds with a high probability. Based on ISSP, the steady-state distribution of the system is further analyzed by applying Stein's method in the neighborhood of the fixed point. ISSP, like state-space collapse in heavy-traffic analysis, is a general approach that may be used to study other complex stochastic systems.

show abstract

Section: Main Contributionsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms

Liu¹,

Gong²,

Lü³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…A recent advance in understanding the delay of multiserver jobs is a characterization of the queueing probability in a large system by Wang et al [42], where the queueing probability is the probability that an arriving job has to queue rather than entering service immediately. Specifically, Wang et al [42] consider a multiserver job system with 𝑛 servers, and study the asymptotic scaling regime where 𝑛 becomes large. The scaling regime also allows server needs and arrival rates of jobs to scale with 𝑛 to capture the trend that the server needs in practice can be large and heterogeneous.…”

Section: Introductionmentioning

confidence: 99%

Sharp Waiting-Time Bounds for Multiserver Jobs

Hong¹,

Wang²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Multiserver jobs, which are jobs that occupy multiple servers simultaneously during service, are prevalent in today's computing clusters. But little is known about the delay performance of systems with multiserver jobs. We consider queueing models for multiserver jobs in a scaling regime where the total number of servers in the system becomes large and meanwhile both the system load and the number of servers that a job needs scale with the total number of servers. Prior work has derived upper bounds on the queueing probability in this scaling regime. However, without proper lower bounds, the existing results cannot be used to differentiate between policies. In this paper, we study the delay performance by establishing sharp bounds on the mean waiting time of multiserver jobs, where the waiting time of a job is the time spent in queueing rather than in service. We first consider the commonly used First-Come-First-Serve (FCFS) policy and characterize the exact order of its mean waiting time. We then prove a lower bound on the mean waiting time of all policies, and demonstrate that there is an order gap between this lower bound and the mean waiting time under FCFS. We finally complement the lower bound with an achievability result: we show that under a priority policy that we call P-Priority, the mean waiting time achieves the order of the lower bound. This achievability result implies the tightness of the lower bound, the asymptotic optimality of P-Priority, and the strict suboptimality of FCFS.CCS Concepts: • Mathematics of computing → Queueing theory; Markov processes; • Networks → Network performance analysis.

show abstract

“…Exact Markovchain methods suffer from the curse of dimensionality as the system grows large [18,29,44]. Asymptotic methods such as fluid and diffusion limits are often only applicable at high load, many servers, or both [45,47,50,51]. Lindley-type recursions can only be applied when the job completion process has a specific structure, renewing after every arrival [22,25].…”

Section: Introductionmentioning

confidence: 99%

WCFS: A new framework for analyzing multiserver systems

Grosof¹,

Scheller‐Wolf²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfortunately, existing tools for analyzing multiserver models have major limitations: Techniques for exact analysis often struggle with high-dimensional models, while techniques for deriving bounds are often too specialized to handle realistic system features, such as variable service rates of jobs. New techniques are needed to handle these complex, important, high-dimensional models.In this paper we introduce the work-conserving finite-skip class of models. This class includes many important models, such as the heterogeneous M/G/k, the limited processor sharing policy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a simple scheduling policy.We prove upper and lower bounds on mean response time for any model in the work-conserving finite-skip class. Our bounds are separated by an additive constant, giving a strong characterization of mean response time at all loads. When specialized to each of the models above, these bounds represent the first bounds on mean response time known for each setting.

show abstract

Zero Queueing for Multi-Server Jobs

Cited by 13 publications

References 11 publications

Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms

Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms

Sharp Waiting-Time Bounds for Multiserver Jobs

WCFS: A new framework for analyzing multiserver systems

Contact Info

Product

Resources

About