“…The first incorporates weight clipping into AIPW (Bembom and van der Laan, 2008;Charles et al, 2013;Wang et al, 2017;Su et al, 2020), where one controls variance by shrinking the weights at the cost of introducing a small bias. The second approach, described above, is to locally stabilize the elements of the AIPW estimator (Luedtke and Van Der Laan, 2016;Hadad et al, 2019;Zhan et al, 2021). Our policy learning algorithm uses an estimator that falls into this second approach, where the weights ht are chosen with the consideration of the worst-case variance in order to be robust.…”