An Analysis of Frame-skipping in Reinforcement Learning

Kalyanakrishnan, Shivaram; Aravindan, Siddharth; Vishwajeet, Bagdawat,; Bhatt, Varun; Goka, Harshith; Gupta, Archit; Krishna, Kalpesh; Piratla, Vihari

doi:10.48550/arxiv.2102.03718

Cited by 2 publications

(2 citation statements)

References 6 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This technique is known as "frame-skip" in RL and is an effective method to enhance learning for problems with discrete actions, see e.g. learning to play atari-games [31], but also for continuous control [32]. While the exact mechanisms behind the improvements stemming from frame-skipping is not fully understood, it is clear that in certain problems it increases the signal-to-noise ratio of every data sample, which simplifies the credit assignment problem.…”

Section: ) the Recomputation Policymentioning

confidence: 99%

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Bøhn¹,

Gros²,

Moe³

et al. 2021

Preprint

View full text Add to dashboard Cite

Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning (RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPC with RL, i.e. parameters affecting the structure of the MPC problem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixturedistribution policy, and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the bestperforming MPC baseline.

show abstract

Section: ) the Recomputation Policymentioning

confidence: 99%

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Bøhn¹,

Gros²,

Moe³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Continuoustime control problems, instead, are usually addressed by means of time discretization, which induces a specific control frequency f , or, equivalently, a time step δ = 1 f (Park, Kim, and Kim 2021). This represents an environment hyperparameter, which may have dramatic effects on the process of learning the optimal policy (Metelli et al 2020;Kalyanakrishnan et al 2021). Indeed, higher frequencies allow for greater control opportunities, but they have significant drawbacks.…”

Section: Introductionmentioning

confidence: 99%

Simultaneously Updating All Persistence Values in Reinforcement Learning

Sabbioni

Bisi

et al. 2023

AAAI

View full text Add to dashboard Cite

In Reinforcement Learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to visit wider regions of the state space and improve the estimation of the action effects. In this work, we derive a novel operator, the All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience, by decomposition into sub-transition, and the high-persistence experience, thanks to the introduction of a suitable bootstrap procedure. In this way, we employ transitions collected at any time scale to update simultaneously the action values of the considered persistence set. We prove the contraction property of the All-Persistence Bellman Operator and, based on it, we extend classic Q-learning and DQN. After providing a study on the effects of persistence, we experimentally evaluate our approach in both tabular contexts and more challenging frameworks, including some Atari games.

show abstract

An Analysis of Frame-skipping in Reinforcement Learning

Cited by 2 publications

References 6 publications

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Simultaneously Updating All Persistence Values in Reinforcement Learning

Contact Info

Product

Resources

About