2022
DOI: 10.48550/arxiv.2202.05100
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adaptively Exploiting d-Separators with Causal Bandits

Abstract: Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent causal bandit algorithms provably incur less regret. However, in practice it is desirable to be agnostic to whether observed variables are a d-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 6 publications
(11 reference statements)
0
2
0
Order By: Relevance
“…By assuming that actions correspond to interventions in a known causal graph, the effects of different actions become related, allowing for better regret bounds [102,115]. If the causal graph is not assumed to be known, there is an additional exploration-exploitation tradeoff that needs to be taken into account, which has been considered in recent work [97,108,18]. Since certain parts of the causal graph might not be relevant to predicting the effect of an action on some reward, the reinforcement learning setting is another case in which targeted structure learning may be more efficient.…”
Section: Discussion and Open Problemsmentioning
confidence: 99%
“…By assuming that actions correspond to interventions in a known causal graph, the effects of different actions become related, allowing for better regret bounds [102,115]. If the causal graph is not assumed to be known, there is an additional exploration-exploitation tradeoff that needs to be taken into account, which has been considered in recent work [97,108,18]. Since certain parts of the causal graph might not be relevant to predicting the effect of an action on some reward, the reinforcement learning setting is another case in which targeted structure learning may be more efficient.…”
Section: Discussion and Open Problemsmentioning
confidence: 99%
“…By assuming that actions correspond to interventions in a known causal graph, the effects of different actions become related, allowing for better regret bounds [102,116]. If the causal graph is not assumed to be known, there is an additional exploration-exploitation trade-off that needs to be taken into account, which has been considered in recent work [18,97,108]. Since certain parts of the causal graph might not be relevant to predicting the effect of an action on some reward, the reinforcement learning setting is another case in which targeted structure learning may be more efficient.…”
Section: Statistical and Computational Complexity Of Causal Structure...mentioning
confidence: 99%