Online discovery of AUV control policies to overcome thruster failures

Ahmadzadeh, S. Reza; Leonetti, Matteo; Carrera, Arnau; Carreras, Marc; Kormushev, Petar; Caldwell, Darwin G.

doi:10.1109/icra.2014.6907821

Cited by 14 publications

(10 citation statements)

References 19 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, the other two thrusters are considered fully functional (α 1 = α 3 = 1). Unlike our previous work [4], our proposed approach benefits from the remaining functionality of the faulty thruster together with the other healthy thrusters. For each experiment we specify a final time, T , for each episode (e.g.…”

Section: Methodsmentioning

confidence: 97%

“…In future, we plan to evaluate the efficiency of the proposed framework using real-world experiments, similar to our experiments with scalarized objectives in [4]. Therefore, the idea behind this experiment is to benefit from the previous experience and increase the performance of the learning approach by decreasing the computation time.…”

Section: Improving the Performancementioning

confidence: 99%

“…The backbone of the algorithm is based on weighted difference between solutions to perturb the population and to create candidate solutions. Previously, we compared the performance of single-objective DE with two other population based algorithms [4][5][6]. In order to solve the described multi-objective problem in this paper, a multi-objective differential evolution algorithm is utilized.…”

Section: F Multi-objective Optimization Algorithmsmentioning

confidence: 99%

“…The approach results in expected rewards lying in a particular region of reward space. An alternative approach to specifying preferences is linear scalarization, [4], [17], [18], in which the objective vector is scalarized according to a weight vector. Varying the weights allows the user to express the relative importance of the objectives.…”

Section: B Multi-objective Reinforcement Learningmentioning

confidence: 99%

“…The proposed approach learns on an onboard simulated model of the AUV. In our previous research [4][5][6] fault-tolerant control policies have been discovered considering the assumption that the failure makes the thruster totally broken, meaning that a faulty thruster is equivalent to a thruster which is turned off. One of the contributions of this work is taking advantage of the remaining functionality of a partially broken thruster.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Multi-objective reinforcement learning for AUV thruster failure recovery

Ahmadzadeh

Kormushev

Caldwell

2014

2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

Self Cite

View full text Add to dashboard Cite

Abstract-This paper investigates learning approaches for discovering fault-tolerant control policies to overcome thruster failures in Autonomous Underwater Vehicles (AUV). The proposed approach is a model-based direct policy search that learns on an on-board simulated model of the vehicle. When a fault is detected and isolated the model of the AUV is reconfigured according to the new condition. To discover a set of optimal solutions a multi-objective reinforcement learning approach is employed which can deal with multiple conflicting objectives. Each optimal solution can be used to generate a trajectory that is able to navigate the AUV towards a specified target while satisfying multiple objectives. The discovered policies are executed on the robot in a closed-loop using AUV's state feedback. Unlike most existing methods which disregard the faulty thruster, our approach can also deal with partially broken thrusters to increase the persistent autonomy of the AUV. In addition, the proposed approach is applicable when the AUV either becomes underactuated or remains redundant in the presence of a fault. We validate the proposed approach on the model of the Girona500 AUV.

show abstract

Section: Methodsmentioning

confidence: 97%

Section: Improving the Performancementioning

confidence: 99%

Section: F Multi-objective Optimization Algorithmsmentioning

confidence: 99%

Section: B Multi-objective Reinforcement Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations