As we enter the era of CMP platforms with multiple threads/cores on the die, the diversity of the simultaneous workloads running on them is expected to increase. The rapid deployment of virtualization as a means to consolidate workloads on to a single platform is a prime example of this trend. In such scenarios, the quality of service (QoS) that each individual workload gets from the platform can widely vary depending on the behavior of the simultaneously running workloads. While the number of cores assigned to each workload can be controlled, there is no hardware or software support in today's platforms to control allocation of platform resources such as cache space and memory bandwidth to individual workloads. In this paper, we propose a QoS-enabled memory architecture for CMP platforms that addresses this problem. The QoS-enabled memory architecture enables more cache resources (i.e. space) and memory resources (i.e. bandwidth) for high priority applications based on guidance from the operating environment. The architecture also allows dynamic resource reassignment during run-time to further optimize the performance of the high priority application with minimal degradation to low priority. To achieve these goals, we will describe the hardware/software support required in the platform as well as the operating environment (O/S and virtual machine monitor). Our evaluation framework consists of detailed platform simulation models and a QoS-enabled version of Linux. Based on evaluation experiments, we show the effectiveness of a QoSenabled architecture and summarize key findings/trade-offs.
Protection from hardware attacks such as snoopers and mod chips has been receiving increasing attention in computer architecture. This paper presents a new combined memory encryption/authentication scheme. Our new split counters for counter-mode encryption simultaneously eliminate counter overflow problems and reduce per-block counter size, and we also dramatically improve authentication performance and security by using the Galois/Counter Mode of operation (GCM), which leverages counter-mode encryption to reduce authentication latency and overlap it with memory accesses. Our results indicate that the split-counter scheme has a negligible overhead even with a small (32KB) counter cache and using only eight counter bits per data block. The combined encryption/authentication scheme has an IPC overhead of 5% on average across SPEC CPU 2000 benchmarks, which is a significant improvement over the 20% overhead of existing encryption/authentication schemes.
This paper introduces the idea of using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minireal hardware beyond the memory processor: the correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide usability, as it can effectively prefetch even for irregular applications. Finally, it is very flexible, as the prefetching algorithm can be customized by the user on an application basis. Our simulation results show that, through a new design of the correlation table and prefetching algorithm, our scheme delivers good results. Specifically, nine mostly-irregular applications show an average speedup of 1.32. Furthermore, our scheme works well in combination with a conventional processor-side sequential prefetcher, in which case the average speedup increases to 1.46. Finally, by exploiting the customization of the prefetching algorithm, we increase the average speedup to 1.53.
Memory bugs are a broad class of bugs that is becoming increasingly common with increasing software complexity, and many of these bugs are also security vulnerabilities. Unfortunately, existing software and even hardware approaches for finding and identifying memory bugs have considerable performance overheads, target only a narrow class of bugs, are costly to implement, or use computational resources inefficiently.This paper describes MemTracker, a new hardware support mechanism that can be configured to perform different kinds of memory access monitoring tasks. MemTracker associates each word of data in memory with a few bits of state, and uses a programmable state transition table to react to different events that can affect this state. The number of state bits per word, the events to which MemTracker reacts, and the transition table are all fully programmable. MemTracker's rich set of states, events, and transitions can be used to implement different monitoring and debugging checkers with minimal performance overheads, even when frequent state updates are needed. To evaluate MemTracker, we map three different checkers onto it, as well as a checker that combines all three. For the most demanding (combined) checker, we observe performance overheads of only 2.7% on average and 4.8% worst-case on SPEC 2000 applications. Such low overheads allow continuous (always-on) use of MemTrackerenabled checkers even in production runs.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.