“…The main focus has been on learning the system dynamics and providing closed-loop guarantees in finite-time for both linear systems [15], [23], [29], [42], [77] (and references within), and nonlinear systems [5], [35], [43], [71]. For model-free RL methods, [30], [56], [60], [90] proved the convergence of policy optimization, a popular modelfree RL method, to the optimal controller for linear timeinvariant systems, [58], [61] for linear time-varying systems, [75] for partially observed linear systems. See [32] for a recent review of policy optimization methods for continuous control problems such as the LQR, H ∞ control, risk-sensitive control, LQG, and output feedback synthesis.…”