2021
DOI: 10.1609/aaai.v35i10.17096
|View full text |Cite
|
Sign up to set email alerts
|

Advice-Guided Reinforcement Learning in a non-Markovian Environment

Abstract: We study a class of reinforcement learning tasks in which the agent receives its reward for complex, temporally-extended behaviors sparsely. For such tasks, the problem is how to augment the state-space so as to make the reward function Markovian in an efficient way. While some existing solutions assume that the reward function is explicitly provided to the learning algorithm (e.g., in the form of a reward machine), the others learn the reward function from the interactions with the environment, assuming no pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…While most literature assumes a reward machine is given (Icarte et al 2018;Hahn et al 2019), the problem of learning the machine from observations has only been considered recently. The work in (Xu et al 2020;Neider et al 2021) explores a solving approach based on satisfiability. In (Icarte et al 2019), the problem of learning a reward machine is viewed through the lens of discrete optimization.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…While most literature assumes a reward machine is given (Icarte et al 2018;Hahn et al 2019), the problem of learning the machine from observations has only been considered recently. The work in (Xu et al 2020;Neider et al 2021) explores a solving approach based on satisfiability. In (Icarte et al 2019), the problem of learning a reward machine is viewed through the lens of discrete optimization.…”
Section: Related Workmentioning
confidence: 99%
“…However, many problem domains require taking into account this history. Examples include learning in settings with sparse rewards (Neider et al 2021), rewards defined as regular expressions and formal logics (Camacho et al 2019), and decision-making under partial observability (Icarte et al 2019). In such settings, the non-Markovian reward signal may not be known, but can be learned from traces of behavior.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In most real-world problems, the rewards do not depend on the immediate state and the chosen action but rather on the agent's visited states and performed actions. In such environments, the Markovian assumption (MA) does not hold, and the reward function has a temporal nature [31], that the agent receives its rewards for complex, temporallyextended behaviors sparsely. For example, a robot should be rewarded for delivering coffee only if a user previously requested it.…”
Section: Introductionmentioning
confidence: 99%
“…We use a Transformer-based architecture [36] to capture long-horizon dependencies for histories of state-action pairs, which include all reward-relevant historical information. Moreover, in non-Markovian environments, the agent receives its reward sparsely for complex actions over a long period of time [31], which is not conducive to training the Transformerbased policy representation model. We construct a sample categorical distribution to sample higher cumulative reward trajectories with a higher probability.…”
Section: Introductionmentioning
confidence: 99%