2017
DOI: 10.48550/arxiv.1712.00378
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Time Limits in Reinforcement Learning

Abstract: In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(16 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…Negative reward is given to invalid action which either attempts to insert when m k = 0 OR leads to trapped state. The formulation is analogous to an agent navigating a grid-world with random obstacles with limited number of steps, where the grids are replaced with different realization of the connectomes [35]. The reward function intends to guide the DQN agent to make modification to connectome at appropriate steps leading to maximum reward at the end step while avoiding either inserting too many or too few connections.…”
Section: Connectomementioning
confidence: 99%
“…Negative reward is given to invalid action which either attempts to insert when m k = 0 OR leads to trapped state. The formulation is analogous to an agent navigating a grid-world with random obstacles with limited number of steps, where the grids are replaced with different realization of the connectomes [35]. The reward function intends to guide the DQN agent to make modification to connectome at appropriate steps leading to maximum reward at the end step while avoiding either inserting too many or too few connections.…”
Section: Connectomementioning
confidence: 99%
“…Our work is most closely related to Pardo et al (2017) and Zintgraf et al (2019). Pardo et al (2017) study the impact of fixed time limits and time-awareness on deep reinforcement learning agents.…”
Section: Related Work and Backgroundmentioning
confidence: 99%
“…Our work is most closely related to Pardo et al (2017) and Zintgraf et al (2019). Pardo et al (2017) study the impact of fixed time limits and time-awareness on deep reinforcement learning agents. They propose using a timestamp as part of the state representation in order to avoid state-aliasing and the non-Markovianity resulting from a finite horizon treatment of an infinite horizon problem.…”
Section: Related Work and Backgroundmentioning
confidence: 99%
“…The episode terminates once the agent is within 1 meter of the goal. We also terminate if the agent has failed to reach the goal after 20 time steps, but treat the two types of termination differently when computing the TD error (see Pardo et al (2017)). Note that it is challenging to specify a meaningful distance metric and local policy on pixel inputs, so it is difficult to apply standard planning algorithms to this task.…”
Section: Didactic Example: 2d Navigationmentioning
confidence: 99%