Abstract:Abstract-We investigate methods to improve fault-tolerance of Autonomous Underwater Vehicles (AUVs) to increase their reliability and persistent autonomy. We propose a learningbased approach that is able to discover new control policies to overcome thruster failures as they happen. The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the AUV. The model is adapted to a new condition when a fault is detected and isolated. Since the approach generates an optima… Show more
“…However, the other two thrusters are considered fully functional (α 1 = α 3 = 1). Unlike our previous work [4], our proposed approach benefits from the remaining functionality of the faulty thruster together with the other healthy thrusters. For each experiment we specify a final time, T , for each episode (e.g.…”
Section: Methodsmentioning
confidence: 97%
“…In future, we plan to evaluate the efficiency of the proposed framework using real-world experiments, similar to our experiments with scalarized objectives in [4]. Therefore, the idea behind this experiment is to benefit from the previous experience and increase the performance of the learning approach by decreasing the computation time.…”
Section: Improving the Performancementioning
confidence: 99%
“…The backbone of the algorithm is based on weighted difference between solutions to perturb the population and to create candidate solutions. Previously, we compared the performance of single-objective DE with two other population based algorithms [4][5][6]. In order to solve the described multi-objective problem in this paper, a multi-objective differential evolution algorithm is utilized.…”
Section: F Multi-objective Optimization Algorithmsmentioning
confidence: 99%
“…The approach results in expected rewards lying in a particular region of reward space. An alternative approach to specifying preferences is linear scalarization, [4], [17], [18], in which the objective vector is scalarized according to a weight vector. Varying the weights allows the user to express the relative importance of the objectives.…”
Section: B Multi-objective Reinforcement Learningmentioning
confidence: 99%
“…The proposed approach learns on an onboard simulated model of the AUV. In our previous research [4][5][6] fault-tolerant control policies have been discovered considering the assumption that the failure makes the thruster totally broken, meaning that a faulty thruster is equivalent to a thruster which is turned off. One of the contributions of this work is taking advantage of the remaining functionality of a partially broken thruster.…”
Abstract-This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes underactuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.
“…However, the other two thrusters are considered fully functional (α 1 = α 3 = 1). Unlike our previous work [4], our proposed approach benefits from the remaining functionality of the faulty thruster together with the other healthy thrusters. For each experiment we specify a final time, T , for each episode (e.g.…”
Section: Methodsmentioning
confidence: 97%
“…In future, we plan to evaluate the efficiency of the proposed framework using real-world experiments, similar to our experiments with scalarized objectives in [4]. Therefore, the idea behind this experiment is to benefit from the previous experience and increase the performance of the learning approach by decreasing the computation time.…”
Section: Improving the Performancementioning
confidence: 99%
“…The backbone of the algorithm is based on weighted difference between solutions to perturb the population and to create candidate solutions. Previously, we compared the performance of single-objective DE with two other population based algorithms [4][5][6]. In order to solve the described multi-objective problem in this paper, a multi-objective differential evolution algorithm is utilized.…”
Section: F Multi-objective Optimization Algorithmsmentioning
confidence: 99%
“…The approach results in expected rewards lying in a particular region of reward space. An alternative approach to specifying preferences is linear scalarization, [4], [17], [18], in which the objective vector is scalarized according to a weight vector. Varying the weights allows the user to express the relative importance of the objectives.…”
Section: B Multi-objective Reinforcement Learningmentioning
confidence: 99%
“…The proposed approach learns on an onboard simulated model of the AUV. In our previous research [4][5][6] fault-tolerant control policies have been discovered considering the assumption that the failure makes the thruster totally broken, meaning that a faulty thruster is equivalent to a thruster which is turned off. One of the contributions of this work is taking advantage of the remaining functionality of a partially broken thruster.…”
Abstract-This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes underactuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.