“…Non-asymptotic analysis of (natural) policy gradient methods. Moving beyond tabular MDPs, finite-time convergence guarantees of PG / NPG methods and their variants have recently been studied for control problems (e.g., [18,19,44,58]), regularized MDPs (e.g., [11,24,54]), constrained MDPs (e.g., [15,50]), robust MDPs (e.g., [29,60]), MDPs with function approximation (e.g., [1,2,10,25,30,45]), Markov games (e.g., [13,14,46,49,61]), and their use in actor-critic methods (e.g., [3,12,48,51]).…”