Abstract:Modern microprocessors contain a variety of mechanisms used to mitigate errors in the logic and memory, referred to as Reliability, Availability, and Serviceability (RAS) techniques. Many of these techniques, such as component disabling, come at a performance cost. With the aggressive downscaling of device dimensions, it is reasonable to expect that chip-wide error rates will intensify in the future and perhaps vary throughout system lifetime. As a result, it is important to reclaim the temporal RAS overheads … Show more
“…This mandates careful designs. To this end, modern (digital) microchip and computer architectures and real-time operating systems include Reliability, Availability, and Serviceability (RAS) modules which monitor ongoing operations and control the current processing speed (Rodopoulos et al, 2015;Noltsis, Rodopoulos, Catthoor, & Soudris, 2017).…”
“…This mandates careful designs. To this end, modern (digital) microchip and computer architectures and real-time operating systems include Reliability, Availability, and Serviceability (RAS) modules which monitor ongoing operations and control the current processing speed (Rodopoulos et al, 2015;Noltsis, Rodopoulos, Catthoor, & Soudris, 2017).…”
To secure correct system operation, a plethora of Reliability, Availability and Serviceability (RAS) techniques have been deployed by circuit designers. RAS mechanisms however, come with the cost of extra clock cycles. In addition, a wide variety of dynamic workloads and different input conditions often constitute preemptive dependability techniques hard to implement. To this end, we focus on a realistic case study of a closed-loop controller that mitigates performance variation with a reactive response. This concept has been discussed but was only illustrated on small benchmarks. In particular, the extension of the approach to manage performance of dynamic workloads on a target platform has not been shown earlier. We compare our scheme against the version of a Linux CPU frequency governor in terms of timing response and energy consumption. Finally, we move forward and suggest a new flavor of our controller to efficiently manage processor temperature. Again, the concept is illustrated with a realistic case study and compared to a modern temperature manager.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.