2021
DOI: 10.48550/arxiv.2102.03718
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Analysis of Frame-skipping in Reinforcement Learning

Abstract: In the practice of sequential decision making, agents are often designed to sense state at regular intervals of 𝑑 time steps, 𝑑 > 1, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially better policies when run with 𝑑 > 1-in fact with 𝑑 even as high as 180. In this paper, we investigate the role of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 6 publications
(9 reference statements)
0
1
0
Order By: Relevance
“…This technique is known as "frame-skip" in RL and is an effective method to enhance learning for problems with discrete actions, see e.g. learning to play atari-games [31], but also for continuous control [32]. While the exact mechanisms behind the improvements stemming from frame-skipping is not fully understood, it is clear that in certain problems it increases the signal-to-noise ratio of every data sample, which simplifies the credit assignment problem.…”
Section: ) the Recomputation Policymentioning
confidence: 99%
“…This technique is known as "frame-skip" in RL and is an effective method to enhance learning for problems with discrete actions, see e.g. learning to play atari-games [31], but also for continuous control [32]. While the exact mechanisms behind the improvements stemming from frame-skipping is not fully understood, it is clear that in certain problems it increases the signal-to-noise ratio of every data sample, which simplifies the credit assignment problem.…”
Section: ) the Recomputation Policymentioning
confidence: 99%
“…Continuoustime control problems, instead, are usually addressed by means of time discretization, which induces a specific control frequency f , or, equivalently, a time step Ξ΄ = 1 f (Park, Kim, and Kim 2021). This represents an environment hyperparameter, which may have dramatic effects on the process of learning the optimal policy (Metelli et al 2020;Kalyanakrishnan et al 2021). Indeed, higher frequencies allow for greater control opportunities, but they have significant drawbacks.…”
Section: Introductionmentioning
confidence: 99%