We consider a system of N parallel single-server queues with unit exponential service rates and a single dispatcher where tasks arrive as a Poisson process of rate λ(N). When a task arrives, the dispatcher assigns it to a server with the shortest queue among d(N) randomly selected servers (1 d(N) N). This load balancing strategy is referred to as a JSQ(d(N)) scheme, marking that it subsumes the celebrated Join-the-Shortest Queue (JSQ) policy as a crucial special case for d(N) = N.We construct a stochastic coupling to bound the difference in the queue length processes between the JSQ policy and a JSQ(d(N)) scheme with an arbitrary value of d(N). We use the coupling to derive the fluid limit in the regime where λ(N)/N → λ < 1 as N → ∞ with d(N) → ∞, along with the associated fixed point. The fluid limit turns out not to depend on the exact growth rate of d(N), and in particular coincides with that for the JSQ policy. We further leverage the coupling to establish that the diffusion limit in the critical regime where (N − λ(N))/ √ N → β > 0 as N → ∞ with d(N)/( √ N log(N)) → ∞ corresponds to that for the JSQ policy. These results indicate that the optimality of the JSQ policy can be preserved at the fluid-level and diffusion-level while reducing the overhead by nearly a factor O(N) and O( √ N/ log(N)), respectively. * d.mukherjee@tue.nl
We consider a system of N parallel queues with identical exponential service rates and a single dispatcher where tasks arrive as a Poisson process. When a task arrives, the dispatcher always assigns it to an idle server, if there is any, and to a server with the shortest queue among d randomly selected servers otherwise (1 ≤ d ≤ N ). This load balancing scheme subsumes the so-called Join-the-Idle Queue (JIQ) policy (d = 1) and the celebrated Join-the-Shortest Queue (JSQ) policy (d = N ) as two crucial special cases. We develop a stochastic coupling construction to obtain the diffusion limit of the queue process in the Halfin-Whitt heavy-traffic regime, and establish that it does not depend on the value of d, implying that assigning tasks to idle servers is sufficient for diffusion level optimality. * Corresponding author: d.mukherjee@tue.nl Load balancing schemes can be broadly categorized as static (open-loop), dynamic (closed-loop), or some intermediate blend, depending on the amount of real-time feedback or state information (e.g. queue lengths or load measurements) that is used in assigning tasks. Within the category of dynamic policies, one can further distinguish between push-based and pull-based approaches, depending on whether the initiative resides with a dispatcher actively collecting feedback from the servers, or with the servers advertizing their availability or load status. The use of state information naturally allows dynamic policies to achieve better performance and greater resource pooling gains, but also involves higher implementation complexity and potentially substantial communication overhead. The latter issue is particularly pertinent in large-scale data centers, which deploy thousands of servers and handle massive demands, with service requests coming in at huge rates.In the present paper we focus on a basic scenario of N parallel queues with identical servers, exponentially distributed service requirements, and a service discipline at each individual server that is oblivious to the actual service requirements (e.g. FCFS). In this canonical case, the so-called Join-the-Shortest-Queue (JSQ) policy has several strong optimality properties, and in particular minimizes the overall mean delay among the class of non-anticipating load balancing policies that do not have any advance knowledge of the service requirements [3,16,18]. (Relaxing any of the three above-mentioned assumptions tends to break the optimality properties of the JSQ policy, and renders the delay-minimizing policy quite complex or even counter-intuitive, see for instance [5,7,17].)In order to implement the JSQ policy, a dispatcher requires instantaneous knowledge of the queue lengths at all the servers, which may give rise to a substantial communication burden, and may not be scalable in scenarios with large numbers of servers. The latter issue has motivated consideration of so-called JSQ(d) policies, where the dispatcher assigns an incoming task to a server with the shortest queue among d servers selected uniformly at random. Me...
We consider a system of N servers inter-connected by some underlying graph topology G N . Tasks with unit-mean exponential processing times arrive at the various servers as independent Poisson processes of rate λ. Each incoming task is irrevocably assigned to whichever server has the smallest number of tasks among the one where it appears and its neighbors in G N .The above model arises in the context of load balancing in large-scale cloud networks and data centers, and has been extensively investigated in the case G N is a clique. Since the servers are exchangeable in that case, mean-field limits apply, and in particular it has been proved that for any λ < 1, the fraction of servers with two or more tasks vanishes in the limit as N → ∞. For an arbitrary graph G N , mean-field techniques break down, complicating the analysis, and the queue length process tends to be worse than for a clique. Accordingly, a graph G N is said to be N-optimal or √ N-optimal when the queue length process on G N is equivalent to that on a clique on an N-scale or √ N-scale, respectively. We prove that if G N is an Erdős-Rényi random graph with average degree d(N), then with high probability it is N-optimal and √ N-optimal if d(N) → ∞ and d(N)/( √ N log(N)) → ∞ as N → ∞, respectively. This demonstrates that optimality can be maintained at N-scale and √ N-scale while reducing the number of connections by nearly a factor N and √ N/ log(N) compared to a clique, provided the topology is suitably random. It is further shown that if G N contains Θ(N) bounded-degree nodes, then it cannot be N-optimal. In addition, we establish that an arbitrary graph G N is N-optimal when its minimum degree is N − o(N), and may not be N-optimal even when its minimum degree is cN + o(N) for any 0 < c < 1/2. Simulation experiments are conducted for various scenarios to corroborate the asymptotic results. * d.mukherjee@tue.nl; † s.c.borst@tue.nl; ‡ j.s.h.v.leeuwaarden@tue.nl arXiv:1707.05866v2 [math.PR] 6 Apr 2019Related work. The above model has been studied in [11,28], focusing on certain fixed-degree graphs and in particular ring topologies. The results demonstrate that the flexibility to forward tasks to a few neighbors, or even just one, with possibly shorter queues significantly improves the performance in terms of the waiting time and tail distribution of the queue length. This resembles the so-called 'power-of-two' effect in the classical case of a complete graph where tasks are assigned to the shortest queue among d servers selected uniformly at random. As shown by Mitzenmacher [16,17] and Vvedenskaya et al. [31], such a 'power-of-d' scheme provides a huge performance improvement over purely random assignment, even when d = 2, in particular super-exponential tail decay, translating into far better waiting-time performance. Further related
We consider a variation of the supermarket model in which the servers can communicate with their neighbors and where the neighborhood relationships are described in terms of a suitable graph. Tasks with unit-exponential service time distributions arrive at each vertex as independent Poisson processes with rate λ, and each task is irrevocably assigned to the shortest queue among the one it first appears and its d − 1 randomly selected neighbors. This model has been extensively studied when the underlying graph is a clique in which case it reduces to the well known power-of-d scheme. In particular, results of Mitzenmacher (1996) and Vvedenskaya et al. (1996) show that as the size of the clique gets large, the occupancy process associated with the queue-lengths at the various servers converges to a deterministic limit described by an infinite system of ordinary differential equations (ODE). In this work, we consider settings where the underlying graph need not be a clique and is allowed to be suitably sparse. We show that if the minimum degree approaches infinity (however slowly) as the number of servers N approaches infinity, and the ratio between the maximum degree and the minimum degree in each connected component approaches 1 uniformly, the occupancy process converges to the same system of ODE as the classical supermarket model. In particular, the asymptotic behavior of the occupancy process is insensitive to the precise network topology. We also study the case where the graph sequence is random, with the N-th graph given as an Erdős-Rényi random graph on N vertices with average degree c(N). Annealed convergence of the occupancy process to the same deterministic limit is established under the condition c(N) → ∞, and under a stronger condition c(N)/ ln N → ∞, convergence (in probability) is shown for almost every realization of the random graph. * many more. In the context of load balancing problems on graphs, [12,28] examines the performance on certain fixed-degree graphs and in particular ring topologies. Their results demonstrate that the flexibility to forward tasks to a few neighbors, or even just one, with possibly shorter queues significantly improves the performance in terms of the waiting time and tail distribution of the queue length. This is similar to the power-of-two effect in the setting of cliques, but the re-
Consider a system of N parallel single-server queues with unit-exponential service time distribution and a single dispatcher where tasks arrive as a Poisson process of rate λ(N). When a task arrives, the dispatcher assigns it to one of the servers according to the Join-the-Shortest Queue (JSQ) policy. Eschenfeldt and Gamarnik (2015) established that in the Halfin-Whitt regime where (N − λ(N))/ √ N → β > 0 as N → ∞, appropriately scaled occupancy measure of the system under the JSQ policy converges weakly on any finite time interval to a certain diffusion process as N → ∞. Recently, it was further established by Braverman (2018) that the convergence result extends to the steady state as well, i.e., stationary occupancy measure of the system converges weakly to the steady state of the diffusion process as N → ∞, proving the interchange of limits result.In this paper we perform a detailed analysis of the steady state of the above diffusion process. Specifically, we establish precise tail-asymptotics of the stationary distribution and scaling of extrema of the process on large time interval. Our results imply that the asymptotic steady-state scaled number of servers with queue length two or larger exhibits an Exponential tail, whereas that for the number of idle servers turns out to be Gaussian. From the methodological point of view, the diffusion process under consideration goes beyond the state-of-the-art techniques in the study of the steady state of diffusion processes. Lack of any closed form expression for the steady state and intricate interdependency of the process dynamics on its local times make the analysis significantly challenging. We develop a technique involving the theory of regenerative processes that provides a tractable form for the stationary measure, and in conjunction with several sharp hitting time estimates, acts as a key vehicle in establishing the results. The technique and the intermediate results might be of independent interest, and can possibly be used in understanding the bulk behavior of the process.
A fundamental challenge in large-scale cloud networks and data centers is to achieve highly efficient server utilization and limit energy consumption, while providing excellent userperceived performance in the presence of uncertain and timevarying demand patterns. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets, and queue-driven auto-scaling techniques have been widely investigated in the literature. In typical data center architectures and cloud environments however, no centralized queue is maintained, and load balancing algorithms immediately distribute incoming tasks among parallel queues. In these distributed settings with vast numbers of servers, centralized queue-driven auto-scaling techniques involve a substantial communication overhead and major implementation burden, or may not even be viable at all.Motivated by the above issues, we propose a joint autoscaling and load balancing scheme which does not require any global queue length information or explicit knowledge of system parameters, and yet provides provably near-optimal service elasticity. We establish the fluid-level dynamics for the proposed scheme in a regime where the total traffic volume and nominal service capacity grow large in proportion. The fluid-limit results show that the proposed scheme achieves asymptotic optimality in terms of user-perceived delay performance as well as energy consumption. Specifically, we prove that both the waiting time of tasks and the relative energy portion consumed by idle servers vanish in the limit. At the same time, the proposed scheme operates in a distributed fashion and involves only constant communication overhead per task, thus ensuring scalability in massive data center operations. Extensive simulation experiments corroborate the fluid-limit results, and demonstrate that the proposed scheme can match the user performance and energy consumption of state-of-the-art approaches that do take full advantage of a centralized queue.
We consider a system of N identical server pools and a single dispatcher where tasks arrive as a Poisson process of rate λ(N). Arriving tasks cannot be queued, and must immediately be assigned to one of the server pools to start execution, or discarded. The execution times are assumed to be exponentially distributed with unit mean, and do not depend on the number of other tasks receiving service. However, the experienced performance (e.g. in terms of received throughput) does degrade with an increasing number of concurrent tasks at the same server pool. The dispatcher therefore aims to evenly distribute the tasks across the various server pools. Specifically, when a task arrives, the dispatcher assigns it to the server pool with the minimum number of tasks among d(N) randomly selected server pools. This assignment strategy is called the JSQ(d(N)) scheme, as it resembles the power-of-d version of the Join-the-Shortest-Queue (JSQ) policy, and will also be referred to as such in the special case d(N) = N.We construct a stochastic coupling to bound the difference in the system occupancy processes between the JSQ policy and a scheme with an arbitrary value of d(N). We use the coupling to derive the fluid limit in case d(N) → ∞ and λ(N)/N → λ as N → ∞, along with the associated fixed point. The fluid limit turns out to be insensitive to the exact growth rate of d(N), and coincides with that for the JSQ policy. We further leverage the coupling to establish that the diffusion limit corresponds to that for the JSQ policy as well, as long as d(N)/ √ N log(N) → ∞, and characterize the common limiting diffusion process. These results indicate that the JSQ optimality can be preserved at the fluid-level and diffusion-level while reducing the overhead by nearly a factor O(N) and O( √ N/ log(N)), respectively. * d.mukherjee@tue.nl
We present an overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead. Aimed at a broad audience, the paper starts with an introduction to the basic load balancing scenario -referred to as the supermarket model -consisting of a single dispatcher where tasks arrive that must immediately be forwarded to one of N single-server queues. The supermarket model is a dynamic counterpart of the classical balls-and-bins setup where balls must be sequentially distributed across bins.A popular class of load balancing algorithms are so-called power-of-d or JSQ(d) policies, where an incoming task is assigned to a server with the shortest queue among d servers selected uniformly at random. As the name reflects, this class includes the celebrated Join-the-Shortest-Queue (JSQ) policy as a special case (d = N ), which has strong stochastic optimality properties and yields a mean waiting time that vanishes as N grows large for any fixed subcritical load. However, a nominal implementation of the JSQ policy involves a prohibitive communication burden in large-scale deployments. In contrast, a simple random assignment policy (d = 1) does not entail any communication overhead, but the mean waiting time remains constant as N grows large for any fixed positive load.In order to examine the fundamental trade-off between delay performance and implementation overhead, we consider an asymptotic regime where the diversity parameter d(N ) depends on N . We investigate what growth rate of d(N ) is required to match the optimal performance of the JSQ policy on fluid and diffusion scale, and achieve a vanishing waiting time in the limit. The results demonstrate that the asymptotics for the JSQ(d(N )) policy are insensitive to the exact growth rate of d(N ), as long as the latter is sufficiently fast, implying that the optimality of the JSQ policy can asymptotically be preserved while dramatically reducing the communication overhead.Stochastic coupling techniques play an instrumental role in establishing the asymptotic optimality and universality properties, and augmentations of the coupling constructions allow these properties to be extended to infinite-server settings and network scenarios. We additionally show how the communication overhead can be reduced yet further by the so-called Join-the-Idle-Queue (JIQ) scheme, leveraging memory at the dispatcher to keep track of idle servers.In the present paper we review scalable load balancing algorithms (LBAs) which achieve excellent delay performance in large-scale systems and yet only involve low implementation overhead. LBAs play a critical role in distributing service requests or tasks (e.g. compute jobs, data base look-ups, file transfers) among servers or distributed resources in parallel-processing systems. The analysis and design of LBAs has attracted strong attention in recent years, mainly spurred by crucial scalability challenges arising in cloud networks and data centers with massive...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.