“…The long-term goal of the agent is to maximise the cumulative expected return, thus improving its performance in the longer run. Shadowed by more traditional optimal control algorithms, Reinforcement Learning has only recently taken off in physics (Albarran-Arriagada et al , 2018; August and Hernández-Lobato, 2018; Bukov, 2018; Bukov et al , 2018; Cárdenas-López et al , 2017; Chen et al , 2014; Chen and Xue, 2019; Dunjko et al , 2017; Fösel et al , 2018; Lamata, 2017; Melnikov et al , 2017; Neukart et al , 2017; Niu et al , 2018; Ramezanpour, 2017; Reddy et al , 2016b; Sriarunothai et al , 2017; Zhang et al , 2018). Of particular interest are biophysics inspired works that seek to use RL to understand navigation and sensing in turbulent environments (Colabrese et al , 2017; Masson et al , 2009; Reddy et al , 2016a; Vergassola et al , 2007).…”