This thesis considers the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans-i.e., thread segments that are disconnected from the thread's root. A termination model is considered for recovering from such failures. In this model the orphans must be detected and cleaned up, and failure-exception notification must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. Two real-time scheduling algorithms (AUA and HUA) and three distributable thread integrity protocols (TPR, D-TPR and W-TPR) are presented. We show that AUA combined with any of the protocols presented bounds the orphan cleanup and recovery time, thereby bounding thread starvation durations and maximizing the total thread accrued timeliness utility. The algorithms and the protocols are implemented in a real-time middleware that supports distributable threads.The experimental studies with the implementation validate the algorithm/protocols' timebounded recovery property and confirm their effectiveness.
We consider the problem of recovering from the failures of distributable threads (“threads”) in distributed real-time systems that operate under runtime uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. When a thread experiences a node failure, the result is a broken thread having an orphan. Under a termination model, the orphans must be detected and aborted, and exceptions must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. Our application/scheduling model includes the proposed distributable thread programming model for the emerging Distributed Real-Time Specification for Java (DRTSJ), together with an exception-handler model. Threads are subject to time/utility function (TUF) time constraints and an utility accrual (UA) optimality criterion. A key underpinning of the TUF/UA scheduling paradigm is the notion of “best-effort” where higher importance threads are always favored over lower importance ones, irrespective of thread urgency as specified by their time constraints. We present a thread scheduling algorithm called HUA and a thread integrity protocol called TPR. We show that HUA and TPR bound the orphan cleanup and recovery time with bounded loss of the best-effort property. Our implementation experience for HUA/TPR in the Reference Implementation of the proposed programming model for the DRTSJ demonstrates the algorithm/protocol's effectiveness.
Networked embedded systems present challenges for designers composing distributed applications with dynamic, real-time, and resilience requirements. We consider the problem of recovering from failures of distributable threads with assured timeliness in dynamic systems with overloads, and node and (permanent/transient) network failures. When a failure prevents timely execution, the thread must be terminated, requiring detecting and aborting thread orphans and delivering exceptions to the farthest, contiguous surviving thread segment for possible resumption, while optimizing system-wide timeliness. A scheduling algorithm (HUA) and two thread integrity protocols (D-TPR and W-TPR) are presented and shown to bound orphan cleanup and recovery times with bounded loss of best-effort behavior. Implementation experience using the emerging Distributed Real-Time Specification for Java (DRTSJ) demonstrates the algorithm/protocols' effectiveness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.