2021
DOI: 10.1007/978-3-030-89817-5_16
|View full text |Cite
|
Sign up to set email alerts
|

Causal Based Action Selection Policy for Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…The causal information is used for different purposes: To deal with latent confounders in different settings like Multi-Armed Bandit (MAB) [15]- [20], MDP [5], [21], [22], and off-police evaluation (OPE) [23], to mitigate heterogeneity and data scarcity [6], or to derive causal explanations about the behavior of model-free RL agents [7]. More closely related with our work, in [10] and [9] it is shown how it is possible to speed-up police learning in goal-conditioned MDP settings via causal knowledge. They took inspiration from model-based reinforcement learning (MBRL), e.g., [24]- [26], but the structure of the given models and the way to use them are different.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The causal information is used for different purposes: To deal with latent confounders in different settings like Multi-Armed Bandit (MAB) [15]- [20], MDP [5], [21], [22], and off-police evaluation (OPE) [23], to mitigate heterogeneity and data scarcity [6], or to derive causal explanations about the behavior of model-free RL agents [7]. More closely related with our work, in [10] and [9] it is shown how it is possible to speed-up police learning in goal-conditioned MDP settings via causal knowledge. They took inspiration from model-based reinforcement learning (MBRL), e.g., [24]- [26], but the structure of the given models and the way to use them are different.…”
Section: Related Workmentioning
confidence: 99%
“…For that reason we can not guarantee that the learned models are complete. However, it has been shown in [11] and [10] that partially correct causal models are enough to speed up policy learning. In our experiments we use the score-based structure learning algorithm Tabu Search (TABU) [38] implementation from the bnlearn R package [39].…”
Section: Causal Discovery (Cd)mentioning
confidence: 99%
See 3 more Smart Citations