“…In contrast, previous works considered the policy to be a probability distribution or, more rarely, a deterministic function [4,9,28]. Finally, we apply our methodology to two different applications in personalized teaching [14,22,27] and viral marketing [12,25,29,33,34], respectively. For simple dynamics and objective functions, which allow for stochastic optimal control approaches, our method achieves a comparable performance even though it does not have access to the true underlying dynamics.…”