“…Off-policy evaluation (OPE) has been studied extensively across a range of different domains, from healthcare (Thapa et al, 2005;Raghu et al, 2018;Nie et al, 2019), to recommender systems (Li et al, 2010;Dudík et al, 2014;, and robotics (Kalashnikov et al, 2018). While a full survey of OPE methods is outside the scope of this article, broadly speaking we can categories OPE methods into groups based the use of importance sampling (Precup, 2000), value functions (Sutton et al, 2009;Migliavacca et al, 2010;Sutton et al, 2016;, and learned transition models (Paduraru, 2007), though a number of methods combine two or more of these components (Jiang & Li, 2015;Thomas & Brunskill, 2016;Munos et al, 2016).…”