“…Since then, DRL has been successful in surpassing all previous computer programs in chess and learning how to accomplish complex robotic tasks (Silver et al, 2017;Andrychowicz et al, 2018). Given DRL's ability to tackle problems with high uncertainty, implementations to motion control scenarios involving marine vessels have been presented recently (Shen and Guo, 2016;Zhang et al, 2016;Pham Tuyen et al, 2017;Yu et al, 2017;Cheng and Zhang, 2018;Martinsen and Lekkas, 2018a,b). In most of these works the authors implemented algorithms pertaining to the class of actor-critic RL methods, which involves two parts (Konda and Tsitsiklis, 2000): The actor, where the gradient of the performance is estimated and the policy parameters are directly updated in a direction of improvement.…”