2021
DOI: 10.48550/arxiv.2111.04835
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Safe Data Collection for Offline and Online Policy Learning

Abstract: Motivated by practical needs in online experimentation and off-policy learning, we study the problem of safe optimal design, where we develop a data logging policy that efficiently explores while achieving competitive rewards with a baseline production policy. We first show, perhaps surprisingly, that a common practice of mixing the production policy with uniform exploration, despite being safe, is sub-optimal in maximizing information gain. Then we propose a safe optimal logging policy for the case when no si… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 20 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?