The thermal profile of multicore systems vary both within an application's execution (intra) and also when the system switches from one application to another (inter). In this paper, we propose an adaptive thermal management approach to improve the lifetime reliability of multicore systems by considering both inter-and intra-application thermal variations. Fundamental to this approach is a reinforcement learning algorithm, which learns the relationship between the mapping of threads to cores, the frequency of a core and its temperature (sampled from on-board thermal sensors). Action is provided by overriding the operating system's mapping decisions using affinity masks and dynamically changing CPU frequency using in-kernel governors. Lifetime improvement is achieved by controlling not only the peak and average temperatures but also thermal cycling, which is an emerging wear-out concern in modern systems. The proposed approach is validated experimentally using an Intel quad-core platform executing a diverse set of multimedia benchmarks. Results demonstrate that the proposed approach minimizes average temperature, peak temperature and thermal cycling, improving the mean-timeto-failure (MTTF) by an average of 2x for intra-application and 3x for inter-application scenarios when compared to existing thermal management techniques. Furthermore, the dynamic and static energy consumption are also reduced by an average 10% and 11% respectively.
Mixed-Criticality (MC) systems have emerged as an effective solution in various industries, where multiple tasks with various real-time and safety requirements (different levels of criticality) are integrated onto a common hardware platform. In these systems, a fault may occur due to different reasons, e.g., hardware defects, software errors or the arrival of unexpected events. In order to tolerate faults in MC systems, the re-execution technique is typically employed, which may lead to overrun of highcriticality tasks (HCTs), which necessitates the drop of low-criticality tasks (LCTs) or degrading their quality. However, frequent drops or relatively long execution times of LCTs (especially mission-critical tasks) are not always desirable and it may impose a negative impact on the performance, or the functionality of MC systems. In this regard, this paper proposes a realistic MC task model and develops a design-time task-drop aware schedulability analysis based on the Earliest Deadline First with Virtual Deadline (EDF-VD) algorithm. According to this analysis and the proposed scheduling policy based on the new MC task model, in the high-criticality (HI) mode, when an HCT overruns and the system switches to the HI mode, the number of drops per LCT is prohibited from passing a predefined threshold. In addition, to guarantee the real-time constraints and safety requirements of MC tasks in the presence of faults (assuming transient faults in this paper), a corresponding scheduling mechanism has been developed. According to the obtained results from an extensive set of simulations, which have been validated through a realistic avionic application, the proposed method improves the acceptance ratio by up to 43.9% compared to state-of-the-art.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.