Enhancing Server Efficiency in the Face of Killer Microseconds

Mirhosseini, Amirhossein; Sriraman, Akshitha; Wenisch, Thomas F.

doi:10.1109/hpca.2019.00037

Cited by 24 publications

(8 citation statements)

References 105 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…in large warehouse-scale computers, and present some optimizations that can help mitigate such system bottlenecks. Mirhosseini et al [116] explore killer microseconds -microsecond-scale "holes" in CPU schedules caused by I/O stalls or idle periods between requests in high throughput microservices that are typical in data centers. They then propose enhancements to server architectures to help mitigate such efects.…”

Section: Pue = Total_power_consumption It_power_consumptionmentioning

confidence: 99%

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

2022

View full text Add to dashboard Cite

Computing systems have undergone a tremendous change in the last few decades with several inflexion points. While Moore’s law guided the semiconductor industry to cram more and more transistors and logic into the same volume, the limits of instruction-level parallelism (ILP) and the end of Dennard’s scaling drove the industry towards multi-core chips. More recently, we have entered the era of domain-specific architectures and chips for new workloads like artificial intelligence (AI) and machine learning (ML). These trends continue, arguably with other limits, along with challenges imposed by tighter integration, extreme form factors and increasingly diverse workloads, making systems more complex to architect, design, implement and optimize from an energy efficiency perspective. Energy efficiency has now become a first order design parameter and constraint across the entire spectrum of computing devices. Many research surveys have gone into different aspects of energy efficiency techniques implemented in hardware and microarchitecture across devices, servers, HPC/cloud, data center systems along with improved software, algorithms, frameworks, and modeling energy/thermals. Somewhat in parallel, the semiconductor industry has developed techniques and standards around specification, modeling/simulation, benchmarking and verification of complex chips; these areas have not been addressed in detail by previous research surveys. This survey aims to bring these domains holistically together, present the latest in each of these areas, highlight potential gaps and challenges, and discuss opportunities for the next generation of energy efficient systems. The survey is composed of a systematic categorization of key aspects of building energy efficient systems - (1) specification - the ability to precisely specify the power intent, attributes or properties at different layers (2) modeling and simulation of the entire system or subsystem (hardware or software or both) so as to be able to experiment with possible options and perform what-if analysis, (3) techniques used for implementing energy efficiency at different levels of the stack, (4) verification techniques used to provide guarantees that the functionality of complex designs are preserved, and (5) energy efficiency benchmarks, standards and consortiums that aim to standardize different aspects of energy efficiency, including cross-layer optimizations.

show abstract

Section: Pue = Total_power_consumption It_power_consumptionmentioning

confidence: 99%

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

2022

View full text Add to dashboard Cite

show abstract

“…A further option is to utilize larger caches, such as the private (per-core) L2 caches or the shared L3 caches, to also store state for the additional hardware threads, similar to Duplexity [56]. A fraction of a 512KB private L2 cache can store the state of tens of threads, while a few MB of an L3 cache can support hundreds of threads.…”

Section: The Space Of Hardware Designsmentioning

confidence: 99%

“…The combination of PS scheduling with threadper-request will actually provide superior performance for server workloads with high execution-time variability [46,80]. In addition to RR scheduling, we can introduce hardware support for thread priorities (e.g., threads used for serving time-sensitive interrupts receive more cycles [56]) or even hardware-based (but software-managed) thread queuing, load balancing, priorities, and scheduling [29,52,67]. Hardware support will be needed for fine-grain tracking of threads' resource consumption for cloud billing or software decisions.…”

Section: The Space Of Hardware Designsmentioning

confidence: 99%

A case against (most) context switches

Humphries

Kaffes

Mazières

et al. 2021

Proceedings of the Workshop on Hot Topics in Operating Systems

View full text Add to dashboard Cite

Multiplexing software threads onto hardware threads and serving interrupts, VM-exits, and system calls require frequent context switches, causing high overheads and significant kernel and application complexity. We argue that context switching is an idea whose time has come and gone, and propose eliminating it through a radically different hardware threading model targeted to solve software rather than hardware problems. The new model adds a large number of hardware threads to each physical core -making thread multiplexing unnecessary -and lets software manage them. The only state change directly triggered in hardware by system calls, exceptions, and asynchronous hardware events will be blocking and unblocking hardware threads. We also present ISA extensions to allow kernel and user software to exploit this new threading model. Developers can use these extensions to eliminate interrupts and implement fast I/O without polling, exception-less system and hypervisor calls, practical microkernels, simple distributed programming models, and untrusted but fast hypervisors. Finally, we suggest practical hardware implementations and discuss the hardware and software challenges toward realizing this novel approach.

show abstract

“…It supports strict priority-based resource allocation policies but fails to consider resource contention between colocated tasks. In addition, Duplexity [47] uses an aggressive multithreading technique to hide the latency in the order of microseconds.…”

Section: Related Workmentioning

confidence: 99%

Dynamic Colocation Policies with Reinforcement Learning

Sun

Lee

2020

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

We draw on reinforcement learning frameworks to design and implement an adaptive controller for managing resource contention. During runtime, the controller observes the dynamic system conditions and optimizes control policies that satisfy latency targets yet improve server utilization. We evaluate a physical prototype that guarantees 95th percentile latencies for a search engine and improves server utilization by up to 70%, compared to exclusively reserving servers for interactive services, for varied batch workloads in machine learning.

show abstract

Enhancing Server Efficiency in the Face of Killer Microseconds

Cited by 24 publications

References 105 publications

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

A case against (most) context switches

Dynamic Colocation Policies with Reinforcement Learning

Contact Info

Product

Resources

About