2012
DOI: 10.1145/2382553.2382556
|View full text |Cite
|
Sign up to set email alerts
|

AutoScale

Abstract: Energy costs for data centers continue to rise, already exceeding $15 billion yearly. Sadly much of this power is wasted. Servers are only busy 10--30% of the time on average, but they are often left on, while idle, utilizing 60% or more of peak power when in the idle state. We introduce a dynamic capacity management policy, AutoScale , that greatly reduces the number of servers needed in data centers driven by unpredictable, time-varying load, while meeting respo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 223 publications
(15 citation statements)
references
References 28 publications
0
13
0
Order By: Relevance
“…We use the techniques proposed in [13] to do both lowfrequency planning and high-frequency tuning for the coarsegrained pipelines as a baseline for comparison. In this baseline, we profile the entire pipeline as a single black box to identify the single maximum batch size capable of meeting the SLO, in contrast to InferLine's per-model profiling.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use the techniques proposed in [13] to do both lowfrequency planning and high-frequency tuning for the coarsegrained pipelines as a baseline for comparison. In this baseline, we profile the entire pipeline as a single black box to identify the single maximum batch size capable of meeting the SLO, in contrast to InferLine's per-model profiling.…”
Section: Methodsmentioning
confidence: 99%
“…Scaling Down (Algorithm 4): InferLine takes a conservative approach to scaling down the pipeline to prevent unnecessary configuration oscillation which can cause SLO misses. Drawing on the work in [13], the Tuner waits for a period of time after any configuration changes to allow the system to stabilize before considering any down scaling actions. Infer-Line uses a delay of 15 seconds (3x the 5 second activation time of spinning up new replicas in the underlying prediction serving frameworks), but the precise value is unimportant as long as it provides enough time for the pipeline to stabilize after a scaling action.…”
Section: High-frequency Tuningmentioning
confidence: 99%
“…Instead, if we select the least loaded host, over time we will have balanced hosts, making it difficult to identify less loaded hosts to drain connections from in case of a scale down. Prior works have shown this load unbalancing technique to facilitate server scaling in web clusters [6,16], while we explore this in the context of stateless MME architectures with states distributed across multiple MME hosts (see §7.2).…”
Section: Selection Of Final Host From Viable Hostsmentioning
confidence: 99%
“…On the other hand, the autoscaler is in charge of adapting the number of available resources according to the incoming workload [37]. The choice of the autoscaler is critical for many different reasons, and, in particular, for pricing issues, like for example, for the minimization of power consumption in a data center [38,37]. In the following, we assume that an autoscaler is in place, and that the number of incoming requests is not changing the number of available resources, therefore we focus only on the load-balancing algorithm.…”
Section: Related Workmentioning
confidence: 99%