2022
DOI: 10.1609/aaai.v36i7.20753
|View full text |Cite
|
Sign up to set email alerts
|

Constraint Sampling Reinforcement Learning: Incorporating Expertise for Faster Learning

Abstract: Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. It takes in multiple potential policy constraints to maintain robustness to misspecification … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…Other work has investigated deep RL techniques with online RL techniques. Mu et al (2022) used deep knowledge tracing to generate synthetic data for online deep RL-based pedagogical planning. Similar approaches have been taken by others (Bassen et al 2020;Zhang et al 2022).…”
Section: Related Workmentioning
confidence: 99%
“…Other work has investigated deep RL techniques with online RL techniques. Mu et al (2022) used deep knowledge tracing to generate synthetic data for online deep RL-based pedagogical planning. Similar approaches have been taken by others (Bassen et al 2020;Zhang et al 2022).…”
Section: Related Workmentioning
confidence: 99%
“…Expertise Incorporated Learning: In order to advance the use of domain adaptation for wireless networks, we consider constraint sampling reinforcement learning (CSRL) [98] as a promising method to quantify the reality gap using domain expertise. In this way, expert knowledge of the networking environment can be integrated during the training process via sensing [51], [55] to enable effective policy transfer in unknown environments with minimal human interaction.…”
Section: B Research Opportunitiesmentioning
confidence: 99%