Beam‐type collisional activation of polypeptide cations that survive ion/ion electron transfer

What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-theart benchmark.

show abstract

Optimal Decision Rules Under Partial Identification

Yata¹

2021

Preprint

View full text Add to dashboard Cite

I consider a class of statistical decision problems in which the policy maker must decide between two alternative policies to maximize social welfare (e.g., the population mean of an outcome) based on a finite sample. The central assumption is that the underlying, possibly infinite-dimensional parameter, lies in a known convex set, potentially leading to partial identification of the welfare effect. An example of such restrictions is the smoothness of counterfactual outcome functions. As the main theoretical result, I obtain a finite-sample decision rule (i.e., a function that maps data to a decision) that is optimal under the minimax regret criterion. This rule is easy to compute, yet achieves optimality among all decision rules; no ad hoc restrictions are imposed on the class of decision rules. I apply my results to the problem of whether to change a policy eligibility cutoff in a regression discontinuity setup. I illustrate my approach in an empirical application to the BRIGHT school construction program in Burkina Faso (Kazianga, Levy, Linden and Sloan, 2013), where villages were selected to receive schools based on scores computed from their characteristics. Under reasonable restrictions on the smoothness of the counterfactual outcome function, the optimal decision rule implies that it is not cost-effective to expand the program. I empirically compare the performance of the optimal decision rule with alternative decision rules.

show abstract

Debiased Off-Policy Evaluation for Recommendation Systems

Narita

Yasui

Yata

2021

View full text Add to dashboard Cite

Counterfactual Learning with General Data-generating Policies

Narita¹,

Okumura²,

Shimizu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Efficient Counterfactual Learning from Bandit Feedback

Narita

Yasui

Yata

2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kohei Yata

Efficient Counterfactual Learning from Bandit Feedback

Optimal Decision Rules Under Partial Identification

Debiased Off-Policy Evaluation for Recommendation Systems

Counterfactual Learning with General Data-generating Policies

Efficient Counterfactual Learning from Bandit Feedback

Contact Info

Product

Resources

About