2022
DOI: 10.1080/01621459.2022.2027776
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

Abstract: A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning frame… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 61 publications
0
10
0
Order By: Relevance
“…OPE via causal inference. Our work is closely related to the line of research that employing tools from causal inference (Pearl, 2009) for studying OPE with unobserved confounders (Oberst and Sontag, 2019;Kallus and Zhou, 2020;Kallus and Zhou, 2021;Mastouri et al, 2021;Shi et al, 2021;Shi et al, 2022). Among them, ; Shi et al (2021) are most relevant to our work.…”
Section: Related Workmentioning
confidence: 82%
“…OPE via causal inference. Our work is closely related to the line of research that employing tools from causal inference (Pearl, 2009) for studying OPE with unobserved confounders (Oberst and Sontag, 2019;Kallus and Zhou, 2020;Kallus and Zhou, 2021;Mastouri et al, 2021;Shi et al, 2021;Shi et al, 2022). Among them, ; Shi et al (2021) are most relevant to our work.…”
Section: Related Workmentioning
confidence: 82%
“…We study a generic experimentation problem within a system represented as a Markov Decision Process (MDP), where treatment corresponds to an action which may interfere with state transitions. This form of interference, which we refer to as Markovian, naturally subsumes the platform examples above, as recently noted by others either implicitly [50] or explicitly [29,55]. In that example, a user arrives at each time step, the platform chooses an action (whether to treat the user), and the user's purchase decision alters the system state (inventory levels).…”
Section: Markovian Interference and Existing Approachesmentioning
confidence: 82%
“…This tack appears to be promising, e.g. [55], but we observe that the resulting variance is necessarily large (Theorem 3).…”
Section: Off-policy Evaluation (Ope)mentioning
confidence: 85%
See 1 more Smart Citation
“…Bandit algorithms (Bubeck and Cesa-Bianchi, 2012;Lattimore and Szepesvári, 2020) and reinforcement learning (Sutton and Barto, 2018) are modern strategies to solve sequential decision making problems. They have received recent attentions in statistics community for business and scientific applications including dynamic pricing (Wang et al, 2020;Chen, Simchi-Levi and Wang, 2021;Chen, Miao and Wang, 2021;Wang et al, 2021), online decision making (Shi et al, 2020;Chen, Lu and Song, 2021;Chen et al, 2022), dynamic treatment regimes (Qi and Liu, 2018;Luckett et al, 2019;Qi et al, 2020;Qi, Miao and Zhang, 2021), and online causal effect in two-sided market (Shi et al, 2022b).…”
mentioning
confidence: 99%