Dimitra Papagiannopoulou scite author profile

Bahar

et al. 2011

While performance and power continue to be important metrics for embedded systems, as CMOS technologies continue to shrink, new metrics such as variability and reliability have emerged as limiting factors in the design of modern embedded systems. In particular, the reliability impact of pMOS negative bias temperature instability (NBTI) has become a serious concern. Recent works have shown how conventional leakage optimization techniques can help mitigate NBTI-induced aging effects on cache memories. In this paper we focus specifically on scratchpad memory (SPM) and present novel software approaches as a means of alleviating the NBTI-induced aging effects. In particular, we demonstrate how intelligent software directed data allocation strategies can extend the lifetime of partitioned SPMs by means of distributing the idleness across the memory sub-banks.

Speculative synchronization for coherence-free embedded NUMA architectures

et al. 2014

High-end embedded systems, like their generalpurpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access (NUMA) costs. In order to keep the cores and memory hierarchy simple, manycore embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These "coherencefree" systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lockbased approaches may be employed to accomplish the synchronization, but may lead to both useability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds new challenges in the design of hardware speculative support. In this paper, we present a new scheme for hardware transactional memory support within a cluster-based NUMA system that lacks an underlying cache-coherence protocol. To the best of our knowledge, this is the first design for speculative synchronization for this type of architecture. Through a set of benchmark experiments using our simulation platform, we show that our design can achieve significant performance improvements over traditional lock-based schemes.

Energy-Efficient and High-Performance Lock Speculation Hardware for Embedded Multicore Systems

Capodanno

ACM Trans. Embed. Comput. Syst.

et al. 2015

Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource-constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this paper, we propose EMBEDDED-SPEC, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.

Flexible data allocation for scratch-pad memories to reduce NBTI effects

Bahar

2013

Playing with Fire

et al. 2015

As silicon integration technology pushes toward atomic dimensions, errors due to static and dynamic variability are an increasing concern. To avoid such errors, designers often turn to "guardband" restrictions on the operating frequency and voltage. If guardbands are too conservative, they limit performance and waste energy, but less conservative guardbands risk moving the system closer to its Critical Operating Point (COP), a frequency-voltage pair that, if surpassed, causes massive instruction failures. In this paper, we propose a novel scheme that allows to dynamically adjust to an evolving COP and operate at highly reduced margins, while guaranteeing forward progress. Specifically, our scheme dynamically monitors the platform and adaptively adjusts to the COP among multiple cores, using lightweight checkpointing and roll-back mechanisms adopted from Hardware Transactional Memory (HTM) for error recovery. Experiments demonstrate that our technique is particularly effective in saving energy while also offering safe execution guarantees. To the best of our knowledge, this work is the first to describe a full-fledged HTM implementation for errorresilient and energy-efficient MPSoC execution.

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures

et al. 2018

Int J Parallel Prog

Hardware transactional memory exploration in coherence-free many-core architectures This work was made openly accessible by BU Faculty. Please share how this access benefits you. Your story matters.Abstract High-end embedded systems, like their general-purpose counterparts, are turning to many-core cluster-based shared-memory architectures that provide a shared memory abstraction subject to non-uniform memory access (NUMA) costs. In order to keep the cores and memory hierarchy simple, many-core embedded systems tend to employ simple, scratchpad-like memories, rather than hardware managed caches that require some form of cache coherence management. These "coherence-free" systems still require some means to synchronize memory accesses and guarantee memory consistency. Conventional lock-based approaches may be employed to accomplish the synchronization, but may lead to both usability and performance issues. Instead, speculative synchronization, such as hardware transactional memory, may be a more attractive approach. However, hardware speculative techniques traditionally rely on the underlying cache-coherence protocol to synchronize memory accesses among the cores. The lack of a cache-coherence protocol adds

Evaluating critical bits in arithmetic operations due to timing violations

Whang

Rachford

et al. 2017

Edge-TM

ACM Trans. Embed. Comput. Syst.

et al. 2017

Scaling of semiconductor devices has enabled higher levels of integration and performance improvements at the price of making devices more susceptible to the effects of static and dynamic variability. Adding safety margins (guardbands) on the operating frequency or supply voltage prevents timing errors, but has a negative impact on performance and energy consumption. We propose Edge-TM, an adaptive hardware/software error management policy that (i) optimistically scales the voltage beyond the edge of safe operation for better energy savings and (ii) works in combination with a Hardware Transactional Memory (HTM)-based error recovery mechanism. The policy applies dynamic voltage scaling (DVS) (while keeping frequency fixed) based on the feedback provided by HTM, which makes it simple and generally applicable. Experiments on an embedded platform show our technique capable of 57% energy improvement compared to using voltage guardbands and an extra 21-24% improvement over existing state-of-the-art error tolerance solutions, at a nominal area and time overhead.