Abstract-Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and tem perature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging.Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application independent models for four different (four-to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1 %-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall). I. IN TRODUCTIONPower and temperature have joined performance as first order system design constraints. All three influence each other, and together they affect architectural and packaging choices. Power consumption characteristics further influence operating cost, reliability, battery lifetime, and device life time. Balancing power efficiency and thermal constraints with performance requires intelligent resource management, and achieving that balance requires real-time power con sumption and temperature information broken down accord ing to resource, together with software and hardware that can leverage such information to enforce management policies.One logical place to institute intelligent resource manage ment with respect to power, performance, and temperature for chip mUltiprocessor (CMP) systems is at the level of in dividual cores. Measuring run-time power of a single core is problematic, though. Current chips do not support it. Power meters only report total consumption for everything behind a single power cable, and even if such aggregate data were sufficient, the use of meters becomes completely infeasible as machines scale up: coordinating output and feedback from thousands of meters would require a separate (super) computing system. Cycle-level system simulators provide in depth information, but are extremely time consuming and 978-1-4244-7614-511 0/$26.00 ©20 10 IEEE prone to error. Power models implemented on top of the architectural abstractions in such simulators are inherently inaccurate [19], and are impossible to verify when attempt ing to assess new architectural designs. Hardware could be enhanced to measure the current and power draw of a CPU socket, but per-core measurement is difficult when cores share a power plane. Embedding measurement devices on chip is usually infeasible. Even when measurement facilities exist -e.g., the Intel Core i7 [16] features per-core power monitoring at the chip-level -they ar...
System designers and application programmers must consider trade-offs between performance and energy. Making energy-aware decisions when designing an application or runtime system requires quantitative information about power consumed by different processor components. We present a methodology to model static and dynamic power consumption of individual cores and the uncore components, and we validate our power model for both sequential and parallel benchmarks at different voltage-frequency pairs on an Intel R Haswell platform.Our power models yield the following insights about energyefficient scaling. (1) We show that uncore energy accounts for up to 74% of total energy. In particular, uncore static energy can be as high as 61% of total energy, potentially making it a major source of energy inefficiency. (2) We find that the frequency at which an application expends the lowest energy depends on how memory-bound it is.(3) We demonstrate that even though using more cores may improve performance, the energy consumed by stalled cores during serial portions of the program can make using fewer cores more energy-efficient.
Hardware transactional memory implementations are becoming increasingly available. For instance, the Intel Core TM i7 4770 implements Restricted Transactional Memory (RTM) support for Intel Transactional Synchronization Extensions (TSX). In this paper, we present a detailed evaluation of RTM performance and energy expenditure. We compare RTM behavior to that of the TinySTM software transactional memory system, first by running microbenchmarks, and then by running the STAMP benchmark suite. We find that which system performs better depends heavily on the workload characteristics. We then conduct a case study of two STAMP applications to assess the impact of programming style on RTM performance and to investigate what kinds of software optimizations can help overcome RTM's hardware limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.