“…But it is used to guarantee the safety of the agent in the training instead of the stability. To guarantee stability, Zhang et al in Zhang, Dong and Pan (2020), Zhang, Pan and Reppa (2020) respectively proposed basic control based SAC algorithms for ships and multi-agents. Furthermore, Lyapunov-based soft actor-critic algorithms have been proposed for traditional control and estimation design in our previous work, in which the stability has been proved by solely using data (Han, Zhang, Wang, & Pan, 2020;Hu, Wu, & Pan, 2020) and a learned Lyapunov function constraint.…”