“…This paper builds on the authors' previous work in [22], [23], where concurrent learning (CL) update laws are utilized to estimate reward functions online using output feedback. However, the dynamical systems in [22], [23] are required to be in Brunovsky canonical form, and as such, only the output feedback case where the state is comprised of the output and its derivatives is addressed. In contrast, the IRL observer (IRL-O) technique in this paper generalizes to any observable linear system, since the developed IRL-Os are in a standard observer form where the state estimates are modified based on the innovation (i.e., the error between the actual and the estimated output).…”