Anais Do XV Encontro Nacional De Inteligência Artificial E Computacional (ENIAC 2018) 2018
DOI: 10.5753/eniac.2018.4422
|View full text |Cite
|
Sign up to set email alerts
|

Batch Reinforcement Learning of Feasible Trajectories in a Ship Maneuvering Simulator

Abstract: Ship control in port channels is a challenging problem that has resisted automated solutions. In this paper we focus on reinforcement learning of control signals so as to steer ships in their maneuvers. The learning process uses fitted Q iteration together with a Ship Maneuvering Simulator. Domain knowledge is used to develop a compact state-space model; we show how this model and the learning process lead to ship maneuvering under difficult conditions.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 15 publications
(10 reference statements)
0
2
0
Order By: Relevance
“…Channel navigation was handled as a path-following problem. In (AMENDOLA et al, 2018), the port channel adopted in the experiments was straight and no environment conditions were considered, the state space adopted was simpler and an algorithm called State space in path planning tasks carries information about the absolute position in a given scenario. The policy, in this case, is learned for that specific setup with very limited capacity of generalization.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…Channel navigation was handled as a path-following problem. In (AMENDOLA et al, 2018), the port channel adopted in the experiments was straight and no environment conditions were considered, the state space adopted was simpler and an algorithm called State space in path planning tasks carries information about the absolute position in a given scenario. The policy, in this case, is learned for that specific setup with very limited capacity of generalization.…”
Section: Literature Reviewmentioning
confidence: 99%
“…(RANDLØV; ALSTRØM, 1998) argued that negative values prevent the agent from going through states unnecessarily to accumulate positive rewards. Nonetheless, in previous work (AMENDOLA et al, 2018), reward functions with negative values for most state-space led the agent to quickly go for collisions in order to minimize overall negative punishment. As limited available space in a channel prevents the vessel from navigating away from the centerline, positive rewards are not an issue.…”
Section: Rewardmentioning
confidence: 99%