As we embrace the era of chip multi-processors (CMP), we are faced with two major architectural challenges: (i) QoS or performance management of disparate applications running on CPU cores contending for shared cache/memory resources and (ii) global/local power management techniques to stay within the overall platform constraints. The problem is exacerbated as the number of cores sharing the resources in a chip increase. In the past, researchers have proposed independent solutions for these two problems. In this paper, we show that rate-based techniques that are employed to address power management can be adapted to address cache/memory QoS issues. The basic approach is to throttle down the processing rate of a core if it is running a low-priority task and its execution is interfering with the performance of a high priority task due to platform resource contention (i.e. cache or memory contention). We evaluate two rate throttling mechanisms (clock modulation, and frequency scaling) for effectively managing the interference between applications running in a CMP platform and delivering QoS/performance management. We show that clock modulation is much more applicable to cache/memory QoS than frequency scaling and that resource monitoring along with rate control provides effective power-performance management in CMP platforms.
Virtualization provides levels of execution isolation and service partition that are desirable in many usage scenarios, but its associated overheads are a major impediment for wide deployment of virtualized environments. While the virtualization cost depends heavily on workloads, it has been demonstrated that the overhead is much higher with I/O intensive workloads compared to those which are compute-intensive. Unfortunately, the architectural reasons behind the I/O performance overheads are not well understood. Early research in characterizing these penalties has shown that cache misses and TLB related overheads contribute to most of I/O virtualization cost. While most of these evaluations are done using measurements, in this paper we present an executiondriven simulation based analysis methodology with symbol annotation as a means of evaluating the performance of virtualized workloads. This methodology provides detailed information at the architectural level (with a focus on cache and TLB) and allows designers to evaluate potential hardware enhancements to reduce virtualization overhead. We apply this methodology to study the network I/O performance of Xen (as a case study) in a full system simulation environment, using detailed cache and TLB models to profile and characterize software and hardware hotspots. By applying symbol annotation to the instruction flow reported by the execution driven simulator we derive function level call flow information. We follow the anatomy of I/O processing in a virtualized platform for network transmit and receive scenarios and demonstrate the impact of cache scaling and TLB size scaling on performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.