Improving Resource Efficiency at Scale with Heracles

Lo, David; Cheng, Liqun; Govindaraju, Rama; Ranganathan, Parthasarathy; Kozyrakis, Christos

doi:10.1145/2882783

Cited by 148 publications

(242 citation statements)

References 78 publications

Supporting

Mentioning

240

Contrasting

Order By: Relevance

“…Servers running latency-critical applications operate at low utilization to guard against queuing delays, long requests, and other sources of performance variability. Further, their spare capacity cannot be used by batch applications, as uncontrolled sharing of cores, caches, and power causes high and unpredictable tail latency degradation [30,33,36]. As a result, datacenters servers typically have utilizations of 5-30% [8,9,37].…”

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

“…These techniques include new cluster managers that schedule and migrate applications across systems to reduce interference [18,32,36,54], fast dynamic voltage-frequency scaling (DVFS) techniques to improve power efficiency [25,29,32,48], hardware and software schemes to use low power idle states [37,39,53], and hardware resource partitioning schemes that allow batch workloads to run alongside latency-critical ones, improving utilization [29,30,33,57].…”

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

“…Many of these studies use workloads internal to datacenter operators like Google or Facebook [32,33,36,38,55,56]. Academic studies use one or a few latencycritical benchmarks [25,48,54], which limits the range of behaviors and performance requirements across which their proposed techniques can be evaluated.…”

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

“…Latency-critical applications have a wide variety of latency requirements and microarchitectural characteristics. However, most recent work in this area uses one or a few latencycritical applications in their evaluations [25,32,33,48], which do not stress a wide range of behaviors. Some prior work in this area even uses more readily-available sequential and parallel batch workloads (e.g., from SPEC CPU2006 or PAR-SEC [12]) and treats them as latency-critical applications [15,57].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Kasture

Sánchez

2016

2016 IEEE International Symposium on Workload Characterization (IISWC)

134

View full text Add to dashboard Cite

Abstract-Latency-critical applications, common in datacenters, must achieve small and predictable tail (e.g., 95 th or 99 th percentile) latencies. Their strict performance requirements limit utilization and efficiency in current datacenters. These problems have sparked research in hardware and software techniques that target tail latency. However, research in this area is hampered by the lack of a comprehensive suite of latency-critical benchmarks.We present TailBench, a benchmark suite and evaluation methodology that makes latency-critical workloads as easy to run and characterize as conventional, throughput-oriented ones. TailBench includes eight applications that span a wide range of latency requirements and domains, and a harness that implements a robust and statistically sound load-testing methodology. The modular design of the TailBench harness facilitates multiple load-testing scenarios, ranging from multi-node configurations that capture network overheads, to simplified single-node configurations that allow measuring tail latency in simulation. Validation results show that the simplified configurations are accurate for most applications. This flexibility enables rapid prototyping of hardware and software techniques for latency-critical workloads.

show abstract

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

Section: A Anatomy Of Latency-critical Applicationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Kasture

Sánchez

2016

2016 IEEE International Symposium on Workload Characterization (IISWC)

134

View full text Add to dashboard Cite

show abstract

“…This work is orthogonal to ours and could be a useful additional signal for our control plane. Heracles manages multiple hardware and software isolation mechanisms, including packet scheduling and cache partitioning, to co-locate latency-sensitive applications with batch tasks while maintaining millisecond SLOs [30]. We limit our focus to DVFS and core assignment but target more aggressive SLOs.…”

Section: Related Workmentioning

confidence: 99%

Energy proportionality and workload consolidation for latency-critical applications

Prekas

Primorac

Belay

et al. 2015

Proceedings of the Sixth ACM Symposium on Cloud Computing

Self Cite

View full text Add to dashboard Cite

Energy proportionality and workload consolidation are important objectives towards increasing efficiency in largescale datacenters. Our work focuses on achieving these goals in the presence of applications with µs-scale tail latency requirements. Such applications represent a growing subset of datacenter workloads and are typically deployed on dedicated servers, which is the simplest way to ensure low tail latency across all loads. Unfortunately, it also leads to low energy efficiency and low resource utilization during the frequent periods of medium or low load.We present the OS mechanisms and dynamic control needed to adjust core allocation and voltage/frequency settings based on the measured delays for latency-critical workloads. This allows for energy proportionality and frees the maximum amount of resources per server for other background applications, while respecting service-level objectives. Monitoring hardware queue depths allows us to detect increases in queuing latencies. Carefully coordinated adjustments to the NIC's packet redirection table enable us to reassign flow groups between the threads of a latency-critical application in milliseconds without dropping or reordering packets. We compare the efficiency of our solution to the Pareto-optimal frontier of 224 distinct static configurations. Dynamic resource control saves 44%-54% of processor energy, which corresponds to 85%-93% of the Pareto-optimal upper bound. Dynamic resource control also allows background jobs to run at 32%-46% of their standalone throughput, which corresponds to 82%-92% of the Pareto bound.

show abstract

Probability distribution based resource management for multitenant cloud clusters

Zhou

Feng

Wang

2021

Concurrency and Computation

View full text Add to dashboard Cite

Accurate resource allocation and colocating jobs are effective ways to increase resource utilization and reduce costs in modern datacenters. The main challenges are fluctuations in resource consumption and interference among colocated jobs. Therefore, we propose a resource management schema based on the probability distributions of resource consumption and completion time for multitenant cloud clusters. First, we found that the characteristics of the task can be well described by the probability distribution of resource consumption and completion time, and the probability distribution function can be obtained by Gaussian fitting. Second, we propose a probability distribution based resource allocation (PDRA) strategy for batch jobs. Third, we design a tail latency aware allocation (TLAA) strategy to use transient resources efficiently while ensuring tail latency requirements. Finally, we design a cost‐effective resource revocation (CERR) strategy to revoke transient resources with minimal eviction costs. Experimental results demonstrate the efficiency of our resource management. Our resource allocation strategies (PDRA and TLAA) can effectively improve resource utilization and reduce job completion time. CERR can reduce the impact of resource revocation on batch jobs and perform better than existing resource revocation strategies.

show abstract

Improving Resource Efficiency at Scale with Heracles

Cited by 148 publications

References 78 publications

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

Energy proportionality and workload consolidation for latency-critical applications

Probability distribution based resource management for multitenant cloud clusters

Contact Info

Product

Resources

About