“…This way of designing the control policy is also known as the certainty equivalence approach (e.g., [4]). Specifically, the authors in [10,25] provided an online algorithm for the LQR problem with unknown system matrices and showed that the regret of the algorithm is Õ( √ N ), where N is the number of time steps in the LQR problem and Õ(•) hides logarithmic factors in N . Note that the authors in [1,11,10,25] considered the infinite horizon LQR setting.…”