COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle Using Deep Reinforcement Learning

Meyer, Eivind; Heiberg, Amalie; Rasheed, Adil; San, Omer

doi:10.1109/access.2020.3022600

Cited by 59 publications

(37 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To find the right balance between penalizing being off-track and avoiding obstacles—which are competing objectives—the weight parameter

is used to regulate the trade-off. This structure is adapted from the work by Meyer et al (2020a) ; Meyer et al (2020b) , which performed similar experiments in 2D. In addition, we add penalties to roll, roll rate, and the use of control actuation to form the complete reward function:

…”

Section: Methods and Implementationmentioning

confidence: 99%

“…The execution layer facilitates the interaction between the deliberate and reactive architectures and decides the final commanded steering ( Tan, 2006 ). The hybrid approach is demonstrated in Meyer et al (2020a) where a DRL agent trained in a purely synthetic environment could achieve the combined objective of path following and collision avoidance with real sea traffic data (moving obstacles) in the Trondheim fjord while complying with collision avoidance regulations. There are still challenges in state-of-the-art COLAV methods for vehicles subjected to nonholonomic constraints, such as AUVs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep Reinforcement Learning Controller for 3D Path Following and Collision Avoidance by Autonomous Underwater Vehicles

2021

Self Cite

View full text Add to dashboard Cite

Control theory provides engineers with a multitude of tools to design controllers that manipulate the closed-loop behavior and stability of dynamical systems. These methods rely heavily on insights into the mathematical model governing the physical system. However, in complex systems, such as autonomous underwater vehicles performing the dual objective of path following and collision avoidance, decision making becomes nontrivial. We propose a solution using state-of-the-art Deep Reinforcement Learning (DRL) techniques to develop autonomous agents capable of achieving this hybrid objective without having a priori knowledge about the goal or the environment. Our results demonstrate the viability of DRL in path following and avoiding collisions towards achieving human-level decision making in autonomous vehicle systems within extreme obstacle configurations.

show abstract

“…To find the right balance between penalizing being off-track and avoiding obstacles—which are competing objectives—the weight parameter

…”

Section: Methods and Implementationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Deep Reinforcement Learning Controller for 3D Path Following and Collision Avoidance by Autonomous Underwater Vehicles

2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…One can investigate how the reinforcement learning (RL) agent trained with an environment with the coarse mesh behaves in the environment with fine resolution. These efforts can be built on our recent works in RL architectures [222,223,327] to devise new approaches for transfer learning from a low‐fidelity model to a high‐fidelity model. It would be a nice attempt to explore constructing new reward functions that are more suited to multi‐x environments to take into account the efficient sampling of big data.…”

Section: Hybrid Analysis and Modelingmentioning

confidence: 99%

“…Therefore, hybrid methods could be instrumental in discovering new control laws. For example, recently, we show how a RL agent can learn complicated control laws through trial and error to achieve complicated tasks of path following and collision avoidance simultaneously [222,223,327]. However, such generalizable learning happens in a black‐box manner so their applicability in critical applications is foreseen to be limited unless the learned control laws can be expressed in comprehensible mathematical form.…”

Section: Big Data Cyberneticsmentioning

confidence: 99%

Hybrid analysis and modeling, eclecticism, and multifidelity computing toward digital twin revolution

2021

Self Cite

View full text Add to dashboard Cite

Most modeling approaches lie in either of the two categories: physics‐based or data‐driven. Recently, a third approach which is a combination of these deterministic and statistical models is emerging for scientific applications. To leverage these developments, our aim in this perspective paper is centered around exploring numerous principle concepts to address the challenges of (i) trustworthiness and generalizability in developing data‐driven models to shed light on understanding the fundamental trade‐offs in their accuracy and efficiency and (ii) seamless integration of interface learning and multifidelity coupling approaches that transfer and represent information between different entities, particularly when different scales are governed by different physics, each operating on a different level of abstraction. Addressing these challenges could enable the revolution of digital twin technologies for scientific and engineering applications.

show abstract

“…After applying the PPO algorithm in a stochastic, synthetic environment ( Meyer, 2020a ), found that the trained agent perfectly generalized to multiple real-world scenarios simulating trafficked areas in the Trondheim fjord, Norway. Meyer et al (2020a) expands on Meyer et al (2020b) by hand-crafting a reward function that encourages the RL agent to comply with the International Regulations for Preventing Collisions at Sea (COLREGs) using the PPO algorithm. Havenstrøm et al (2021) applies a curriculum learning technique with the PPO algorithm to control a 6-DOF underactuated autonomous underwater vehicle (AUV), gradually increasing the presence and severity of obstacles and disturbances during the RL training process.…”

Section: Introductionmentioning

confidence: 99%

Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

et al. 2021

Self Cite

View full text Add to dashboard Cite

Reinforcement Learning (RL) controllers have proved to effectively tackle the dual objectives of path following and collision avoidance. However, finding which RL algorithm setup optimally trades off these two tasks is not necessarily easy. This work proposes a methodology to explore this that leverages analyzing the performance and task-specific behavioral characteristics for a range of RL algorithms applied to path-following and collision-avoidance for underactuated surface vehicles in environments of increasing complexity. Compared to the introduced RL algorithms, the results show that the Proximal Policy Optimization (PPO) algorithm exhibits superior robustness to changes in the environment complexity, the reward function, and when generalized to environments with a considerable domain gap from the training environment. Whereas the proposed reward function significantly improves the competing algorithms’ ability to solve the training environment, an unexpected consequence of the dimensionality reduction in the sensor suite, combined with the domain gap, is identified as the source of their impaired generalization performance.

show abstract

COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle Using Deep Reinforcement Learning

Cited by 59 publications

References 56 publications

Deep Reinforcement Learning Controller for 3D Path Following and Collision Avoidance by Autonomous Underwater Vehicles

Deep Reinforcement Learning Controller for 3D Path Following and Collision Avoidance by Autonomous Underwater Vehicles

Hybrid analysis and modeling, eclecticism, and multifidelity computing toward digital twin revolution

Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Contact Info

Product

Resources

About