We propose a new framework for the (length,reliability) bicriteria static multiprocessor scheduling problem. Our first criterion remains the schedule's length, crucial to assess the system's realtime property. For our second criterion, we consider the global system failure rate, seen as if the whole system were a single task scheduled onto a single processor, instead of the usual reliability, because it does not depend on the schedule length like the reliability does (due to its computation in the classical exponential distribution model). Therefore, we control better the replication factor of each individual task of the dependency task graph given as a specification, with respect to the desired failure rate. To solve this bicriteria optimization problem, we take the failure rate as a constraint, and we minimize the schedule length. We are thus able to produce, for a given dependency task graph and multiprocessor architecture, a Pareto curve of non-dominated solutions, among which the user can choose the compromise that fits his requirements best. Compared to the other bicriteria (length,reliability) scheduling algorithms found in the literature, the algorithm we present here is the first able to improve significantly the reliability, by several orders of magnitude, making it suitable to safety critical systems.
Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because the system is critical.
Abstract. We propose an off-line scheduling heuristics which, from a given software application graph and a given multiprocessor architecture (homogeneous and fully connected), produces a static multiprocessor schedule that optimizes three criteria: its length (crucial for real-time systems), its reliability (crucial for dependable systems), and its power consumption (crucial for autonomous systems). Our tricriteria scheduling heuristics, TSH, uses the active replication of the operations and the data-dependencies to increase the reliability, and uses dynamic voltage and frequency scaling to lower the power consumption.
For autonomous critical real-time embedded systems (e.g., satellite), guaranteeing a very high level of reliability is as important as keeping the power consumption as low as possible. We propose an off-line scheduling heuristic which, from a given software application graph and a given multiprocessor architecture (homogeneous and fully connected), produces a static multiprocessor schedule that optimizes three criteria: its length (crucial for real-time systems), its reliability (crucial for dependable systems), and its power consumption (crucial for autonomous systems). Our tricriteria scheduling heuristic, called TSH, uses the active replication of the operations and the data-dependencies to increase the reliability, and uses dynamic voltage and frequency scaling to lower the power consumption. We demonstrate the soundness of TSH. We also provide extensive simulation results to show how TSH behaves in practice: Firstly, we run TSH on a single instance to provide the whole Pareto front in 3D; Secondly, we compare TSH versus the ECS heuristic (Energy-Conscious Scheduling) from the literature; And thirdly, we compare TSH versus an optimal Mixed Linear Integer Program.
In this paper, we present a new tri-criteria scheduling heuristic for scheduling data-flow graphs of operations onto parallel heterogeneous architectures according to three criteria: first the minimization of the schedule length crucial for real-time systems, second the maximization of the system reliability crucial for dependable systems, and third minimizing energy consumption crucial for autonomous systems. The proposed algorithm is a list scheduling heuristics, It uses the active replication of operations to improve the reliability and the dynamic voltage scaling to minimize the energy consumption.
Hardware fault tolerance is an important consideration in critical distributed realtime embedded systems and has been extensively researched. In these systems, critical real-time constraints must be satisfied even in the presence of hardware component failures. Our goal is to propose a solution to automatically produce a fault-tolerant distributed schedule of a given algorithm onto a given distributed architecture, according to real-time constraints. The distributed architectures we consider have bidirectional point-to-point communication links. Our solution is a list scheduling heuristics, based on disjoint paths to tolerate a fixed number of arbitrary processor and communication link failures. Because of the resource limitation in embedded systems, our heuristics implements a software solution based on the active replication technique, where each operation of the algorithm is replicated on different processors. With a detailed example, we show the techniques used to satisfy the real-time constraints and tolerate the failure of processor and communication links. Simulations show the efficiency of our method compared with other heuristics found in the literature.
Abstract:Embedded real-time systems are being increasingly used in a major part of critical applications. In these systems, critical real-time constraints must be satisfied even in the presence of failures. In this paper, we present a new method-based on graph transformation that introduces fault-tolerance in building embedded real-time systems. The proposed method targets distributed architecture and can tolerate a fixed number of arbitrary processors and communication links failures. Because of the resource limitation in embedded systems, our method uses a software-based replication technique to provide fault-tolerance. Finally, since we use graph transformation to perform replication, our method may be used by any off-line distribution-scheduling algorithm to generate a fault-tolerant distributed schedule.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.