2021
DOI: 10.1007/978-3-030-86520-7_38
|View full text |Cite
|
Sign up to set email alerts
|

Off-Policy Differentiable Logic Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(19 citation statements)
references
References 12 publications
0
17
0
Order By: Relevance
“…Many estimators for G φ H have been proposed [11] [14], such as the per-trajectory IS estimator, per-step estimator, weighted estimator, and doubly robust estimator [15] [16]. However, these estimators cannot be applied directly because in the node dropout setting the state-action to state transition terms must be handled appropriately.…”
Section: B Transformed Policy Importance Samplingmentioning
confidence: 99%
“…Many estimators for G φ H have been proposed [11] [14], such as the per-trajectory IS estimator, per-step estimator, weighted estimator, and doubly robust estimator [15] [16]. However, these estimators cannot be applied directly because in the node dropout setting the state-action to state transition terms must be handled appropriately.…”
Section: B Transformed Policy Importance Samplingmentioning
confidence: 99%
“…H−1 h=0 V ( Ĵh (π)). However, we can still compute the variance and an variance estimator via a recursive form (Jiang and Li, 2016).…”
Section: Extension To Rlmentioning
confidence: 99%
“…OPE has been used successfully for many real world systems, such as recommendation systems (Li et al, 2011) and digital marketing (Thomas et al, 2017), to select a good policy to be deployed in the real world. A variety of estimators have been proposed, particularly based on importance sampling (IS) (Hammersley and Handscomb, 1964) reduce variance, such as self-normalization (Swaminathan and Joachims, 2015b), direct methods that use reward models and variance reduction techniques like the doubly robust (DR) estimator (Dudík et al, 2011;Jiang and Li, 2016;Thomas and Brunskill, 2016). Often high-confidence estimation is key, with the goal to estimate confidence intervals around these value estimates that maintain coverage without being too loose (Thomas et al, 2015a,b;Swaminathan and Joachims, 2015a;Kuzborskij et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…The IPS estimator often faces a high variance [7] which can be reduced by a self-normalized inverse propensity scoring (SNIPS) estimator [30]. Furthermore, Doubly Robust (DR) estimator [11,38] is proposed to simultaneously consider imputation errors and propensities in a doubly robust, for reducing the high variance in IPS.…”
Section: Recommendation With Selection Biasmentioning
confidence: 99%
“…The robustness and accuracy of the inverse probability estimation is the key to the counterfactual learning for the recommendation systems. The imputation errors and propensities are simultaneously considered in a doubly robust way for recommendation on MNAR [38] and reinforcement learning [11].…”
Section: Introductionmentioning
confidence: 99%