2021
DOI: 10.3389/frobt.2021.738113
|View full text |Cite
|
Sign up to set email alerts
|

Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters

Abstract: Reinforcement Learning (RL) controllers have proved to effectively tackle the dual objectives of path following and collision avoidance. However, finding which RL algorithm setup optimally trades off these two tasks is not necessarily easy. This work proposes a methodology to explore this that leverages analyzing the performance and task-specific behavioral characteristics for a range of RL algorithms applied to path-following and collision-avoidance for underactuated surface vehicles in environments of increa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(7 citation statements)
references
References 25 publications
0
4
0
Order By: Relevance
“…We prefer DDPG over TRPO because DDPG is computationally less expensive than TRPO, making it a better choice for problems with a large state or action space or where data collection is expensive. DDPG is generally more sampleefficient than TRPO and has been shown to converge faster than TRPO in some cases, making it a better choice for problems where the agent needs to learn quickly [36].…”
Section: Reinforcement Learning For Aerial-irs Trajectory Optimizatio...mentioning
confidence: 99%
“…We prefer DDPG over TRPO because DDPG is computationally less expensive than TRPO, making it a better choice for problems with a large state or action space or where data collection is expensive. DDPG is generally more sampleefficient than TRPO and has been shown to converge faster than TRPO in some cases, making it a better choice for problems where the agent needs to learn quickly [36].…”
Section: Reinforcement Learning For Aerial-irs Trajectory Optimizatio...mentioning
confidence: 99%
“…There are many model-free DRL frameworks developed in the past decade [36]- [40]. The key features of each framework include whether it is value optimization or policy optimization-based;…”
Section: Deep Reinforcement Learning Algorithmmentioning
confidence: 99%
“…3) On-Policy vs. Off-Policy in DRL: Policy-based DRL algorithms with stochastic policy can be further categorized into on-policy and off-policy learning methods [39]. SAC adopts an offpolicy learning method, while TRPO and PPO are on-policy ones [40]. An off-policy algorithm learns the optimal policy (approximated by a target NN) that is different from the behavior policy (approximated by a behavior NN) for generating new experiences during training.…”
Section: Deep Reinforcement Learning Algorithmmentioning
confidence: 99%
“…Additionally, Ryohei Sawada applied PPO in combination with LSTM neural networks to achieve autonomous ship collision avoidance in continuous action spaces [30]. Most recently, in 2021, Thomas Nakken Larsen compared the effectiveness of various DRL algorithms for safe navigation in challenging waterways [31].…”
Section: Introductionmentioning
confidence: 99%