In the practice of sequential decision making, agents are often designed to sense state at regular intervals of π time steps, π > 1, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially better policies when run with π > 1-in fact with π even as high as 180. In this paper, we investigate the role of the parameter π in RL; π is called the "frame-skip" parameter, since states in the Atari domain are images. For evaluating a fixed policy, we observe that under standard conditions, frame-skipping does not affect asymptotic consistency. Depending on other parameters, it can possibly even benefit learning. To use π > 1 in the control setting, one must first specify which π-step open-loop action sequences can be executed in between sensing steps. We focus on "action-repetition", the common restriction of this choice to π-length sequences of the same action. We define a task-dependent quantity called the "price of inertia", in terms of which we upper-bound the loss incurred by action-repetition. We show that this loss may be offset by the gain brought to learning by a smaller task horizon. Our analysis is supported by experiments on different tasks and learning algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsβcitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright Β© 2024 scite LLC. All rights reserved.
Made with π for researchers
Part of the Research Solutions Family.