Bhavishya Goel scite author profile

Gioiosa

et al. 2010

Abstract-Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and tem perature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging.Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application independent models for four different (four-to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1 %-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall). I. IN TRODUCTIONPower and temperature have joined performance as first order system design constraints. All three influence each other, and together they affect architectural and packaging choices. Power consumption characteristics further influence operating cost, reliability, battery lifetime, and device life time. Balancing power efficiency and thermal constraints with performance requires intelligent resource management, and achieving that balance requires real-time power con sumption and temperature information broken down accord ing to resource, together with software and hardware that can leverage such information to enforce management policies.One logical place to institute intelligent resource manage ment with respect to power, performance, and temperature for chip mUltiprocessor (CMP) systems is at the level of in dividual cores. Measuring run-time power of a single core is problematic, though. Current chips do not support it. Power meters only report total consumption for everything behind a single power cable, and even if such aggregate data were sufficient, the use of meters becomes completely infeasible as machines scale up: coordinating output and feedback from thousands of meters would require a separate (super) computing system. Cycle-level system simulators provide in depth information, but are extremely time consuming and 978-1-4244-7614-511 0/$26.00 ©20 10 IEEE prone to error. Power models implemented on top of the architectural abstractions in such simulators are inherently inaccurate [19], and are impossible to verify when attempt ing to assess new architectural designs. Hardware could be enhanced to measure the current and power draw of a CPU socket, but per-core measurement is difficult when cores share a power plane. Embedding measurement devices on chip is usually infeasible. Even when measurement facilities exist -e.g., the Intel Core i7 [16] features per-core power monitoring at the chip-level -they ar...

show abstract

A Methodology for Modeling Dynamic and Static Power Consumption for Multicore Processors

2016

System designers and application programmers must consider trade-offs between performance and energy. Making energy-aware decisions when designing an application or runtime system requires quantitative information about power consumed by different processor components. We present a methodology to model static and dynamic power consumption of individual cores and the uncore components, and we validate our power model for both sequential and parallel benchmarks at different voltage-frequency pairs on an Intel R Haswell platform.Our power models yield the following insights about energyefficient scaling. (1) We show that uncore energy accounts for up to 74% of total energy. In particular, uncore static energy can be as high as 61% of total energy, potentially making it a major source of energy inefficiency. (2) We find that the frequency at which an application expends the lowest energy depends on how memory-bound it is.(3) We demonstrate that even though using more cores may improve performance, the energy consumed by stalled cores during serial portions of the program can make using fewer cores more energy-efficient.

show abstract

Techniques to Measure, Model, and Manage Power

Själander

2012

Performance and Energy Analysis of the Restricted Transactional Memory Implementation on Haswell

Titos-Gil

Negi

et al. 2014

Hardware transactional memory implementations are becoming increasingly available. For instance, the Intel Core TM i7 4770 implements Restricted Transactional Memory (RTM) support for Intel Transactional Synchronization Extensions (TSX). In this paper, we present a detailed evaluation of RTM performance and energy expenditure. We compare RTM behavior to that of the TinySTM software transactional memory system, first by running microbenchmarks, and then by running the STAMP benchmark suite. We find that which system performs better depends heavily on the workload characteristics. We then conduct a case study of two STAMP applications to assess the impact of programming style on RTM performance and to investigate what kinds of software optimizations can help overcome RTM's hardware limitations.

show abstract

Power-Aware Resource Scheduling in Base Stations

Själander

et al. 2011