2020
DOI: 10.1109/access.2020.3022600
|View full text |Cite
|
Sign up to set email alerts
|

COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle Using Deep Reinforcement Learning

Abstract: Path Following and Collision Avoidance, be it for unmanned surface vessels or other autonomous vehicles, are two fundamental guidance problems in robotics. For many decades, they have been subject to academic study, leading to a vast number of proposed approaches. However, they have mostly been treated as separate problems, and have typically relied on non-linear first-principles models with parameters that can only be determined experimentally. The rise of deep reinforcement learning in recent years suggests … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 59 publications
(37 citation statements)
references
References 56 publications
0
29
0
Order By: Relevance
“…To find the right balance between penalizing being off-track and avoiding obstacles—which are competing objectives—the weight parameter is used to regulate the trade-off. This structure is adapted from the work by Meyer et al (2020a) ; Meyer et al (2020b) , which performed similar experiments in 2D. In addition, we add penalties to roll, roll rate, and the use of control actuation to form the complete reward function: …”
Section: Methods and Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…To find the right balance between penalizing being off-track and avoiding obstacles—which are competing objectives—the weight parameter is used to regulate the trade-off. This structure is adapted from the work by Meyer et al (2020a) ; Meyer et al (2020b) , which performed similar experiments in 2D. In addition, we add penalties to roll, roll rate, and the use of control actuation to form the complete reward function: …”
Section: Methods and Implementationmentioning
confidence: 99%
“…The execution layer facilitates the interaction between the deliberate and reactive architectures and decides the final commanded steering ( Tan, 2006 ). The hybrid approach is demonstrated in Meyer et al (2020a) where a DRL agent trained in a purely synthetic environment could achieve the combined objective of path following and collision avoidance with real sea traffic data (moving obstacles) in the Trondheim fjord while complying with collision avoidance regulations. There are still challenges in state-of-the-art COLAV methods for vehicles subjected to nonholonomic constraints, such as AUVs.…”
Section: Introductionmentioning
confidence: 99%
“…One can investigate how the reinforcement learning (RL) agent trained with an environment with the coarse mesh behaves in the environment with fine resolution. These efforts can be built on our recent works in RL architectures [222,223,327] to devise new approaches for transfer learning from a low‐fidelity model to a high‐fidelity model. It would be a nice attempt to explore constructing new reward functions that are more suited to multi‐x environments to take into account the efficient sampling of big data.…”
Section: Hybrid Analysis and Modelingmentioning
confidence: 99%
“…Therefore, hybrid methods could be instrumental in discovering new control laws. For example, recently, we show how a RL agent can learn complicated control laws through trial and error to achieve complicated tasks of path following and collision avoidance simultaneously [222,223,327]. However, such generalizable learning happens in a black‐box manner so their applicability in critical applications is foreseen to be limited unless the learned control laws can be expressed in comprehensible mathematical form.…”
Section: Big Data Cyberneticsmentioning
confidence: 99%
“…After applying the PPO algorithm in a stochastic, synthetic environment ( Meyer, 2020a ), found that the trained agent perfectly generalized to multiple real-world scenarios simulating trafficked areas in the Trondheim fjord, Norway. Meyer et al (2020a) expands on Meyer et al (2020b) by hand-crafting a reward function that encourages the RL agent to comply with the International Regulations for Preventing Collisions at Sea (COLREGs) using the PPO algorithm. Havenstrøm et al (2021) applies a curriculum learning technique with the PPO algorithm to control a 6-DOF underactuated autonomous underwater vehicle (AUV), gradually increasing the presence and severity of obstacles and disturbances during the RL training process.…”
Section: Introductionmentioning
confidence: 99%