Power dissipation is increasingly important in CPUs rang
Speculative execution, the base on which modern high-performance general-purpose CPUs are built on, has recently been shown to enable a slew of security attacks. All these attacks are centered around a common set of behaviors: During speculative execution, the architectural state of the system is kept unmodified, until the speculation can be verified. In the event that a misspeculation occurs, then anything that can affect the architectural state is reverted (squashed) and re-executed correctly. However, the same is not true for the microarchitectural state. Normally invisible to the user, changes to the microarchitectural state can be observed through various sidechannels, with timing differences caused by the memory hierarchy being one of the most common and easy to exploit. The speculative side-channels can then be exploited to perform attacks that can bypass software and hardware checks in order to leak information. These attacks, out of which the most infamous are perhaps Spectre and Meltdown, have led to a frantic search for solutions.In this work, we present our own solution for reducing the microarchitectural state-changes caused by speculative execution in the memory hierarchy. It is based on the observation that if we only allow accesses that hit in the L1 data cache to proceed, then we can easily hide any microarchitectural changes until after the speculation has been verified. At the same time, we propose to prevent stalls by value predicting the loads that miss in the L1. Value prediction, though speculative, constitutes an invisible form of speculation, not seen outside the core. We evaluate our solution and show that we can prevent observable microarchitectural changes in the memory hierarchy while keeping the performance and energy costs at 11% and 7%, respectively. In comparison, the current state of the art solution, InvisiSpec, incurs a 46% performance loss and a 51% energy increase.
Much of the complexity and overhead (directory, state bits, invalidations) of a typical directory coherence implementation stems from the effort to make it "invisible" even to the strongest memory consistency model. In this paper, we show that a much simpler, directory-less/broadcast-less, multicore coherence can outperform a directory protocol but without its complexity and overhead. Motivated by recent efforts to simplify coherence, we propose a hardware approach that does not require any application guidance. The cornerstone of our approach is a dynamic, application-transparent, write-policy (write-back for private data, write-through for shared data), simplifying the protocol to just two stable states. Self-invalidation of the shared data at synchronization points allows us to remove the directory (and invalidations) completely, with just a data-race-free guarantee from software. This leads to our main result: a virtually costless coherence that outperforms a MESI directory protocol (by 4.8%) while at the same time reducing shared cache and network energy consumption (by 14.2%) for 15 parallel benchmarks, on 16 cores.
Abstract-We present Continuously Adaptive Dynamic Voltage/Frequency scaling in Linux systems running on Intel i7 and AMD Phenom II processors. By exploiting slack, inherent in memory-bound programs, our approach aims to improve power efficiency even when the processor does not sit idle. Our underlying methodology is based on a simple first-order processor performance model in which frequency scaling is expressed as a change (in cycles) of the main memory latency. Utilizing available monitoring hardware we show that our model is powerful enough to i) predict with reasonable accuracy the effect of frequency scaling (in terms of performance loss) and ii) predict the core energy under different V/f combinations. To validate our approach we perform highly accurate, fine-grained power measurements directly on the off-chip voltage regulators. We use our model to implement various DVFS policies as Linux "green" governors to continuously optimize for various powerefficiency metrics such as EDP or ED 2 P, or achieve energy savings with a user-specified limit on performance loss. Our evaluation shows that, for SPEC2006 workloads, our governors achieve dynamically the same optimal EDP or ED 2 P (within 2% on avg.) as an exhaustive search of all possible frequencies. Energy savings can reach up to 56% in memory-bound workloads with corresponding improvements of about 55% for EDP or ED 2 P.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.