Near-optimal Individualized Treatment Recommendations

Meng, Haomiao; Zhao, Ying-Qi; Fu, Haoda; Qiao, Xingye

doi:10.48550/arxiv.2004.02772

Cited by 2 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, a few methods have been developed in the statistics literature on learning the optimal policy in mobile health applications (Ertefaie 2014;Luckett et al 2020;Hu et al 2020;Liao, Qi, and Murphy 2020). In addition, there is a growing literature on adapting reinforcement learning to develop dynamic treatment regimes in precision medicine, to recommend treatment decisions based on individual patients' information (Murphy 2003;Chakraborty, Murphy, and Strecher 2010;Qian and Murphy 2011;Zhao et al 2012;Zhang et al 2013;Song et al 2015;Zhao et al 2015;Zhu et al 2017;Zhang et al 2018;Wang et al 2018;Shi et al 2018aShi et al , 2018bMo, Qi, and Liu 2020;Meng et al 2020).…”

Section: Related Workmentioning

confidence: 99%

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

Shi

Wang

Luo³

et al. 2022

Journal of the American Statistical Association

View full text Add to dashboard Cite

A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at https://github.com/callmespring/ CausalRL.

show abstract

Section: Related Workmentioning

confidence: 99%

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

Shi

Wang

Luo³

et al. 2022

Journal of the American Statistical Association

View full text Add to dashboard Cite

show abstract

“…Recently, a number of proposals utilize reinforcement learning in mobile health or two-sided markets (Ertefaie, 2014;Luckett et al, 2019;Chen et al, 2020;Hu et al, 2019;Liao et al, 2020;Wang et al, 2021;Zhou et al, 2021;Li et al, 2022a,b;Liao et al, 2022;Shi et al, 2022a,b). In addition, there is a growing literature on adapting reinforcement learning to develop dynamic treatment regimes in precision medicine, to recommend treatment decisions based on individual patients' information (Murphy, 2003;Chakraborty et al, 2010;Qian and Murphy, 2011;Zhao et al, 2012;Zhang et al, 2013;Song et al, 2015;Zhao et al, 2015;Zhang et al, 2015Zhang et al, , 2018Zhu et al, 2017;Wang et al, 2018;Shi et al, 2018a,b;Mo et al, 2020;Meng et al, 2020;Cai et al, 2021;Fang et al, 2021). All these methods considered a single-agent setup where only one agent exists in the environment.…”

mentioning

confidence: 99%

A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets

Shi,

Wan,

Song

et al. 2023

Ann. Appl. Stat.

View full text Add to dashboard Cite

The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at https://github.com/RunzheStat/CausalMARL.

show abstract

Near-optimal Individualized Treatment Recommendations

Cited by 2 publications

References 29 publications

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets

Contact Info

Product

Resources

About