Dylan R. Ashley scite author profile

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time [4,5]. Ghosh et al. [2] proved that Goal-Conditional Supervised Learning (GCSL)-which can be viewed as a simplified version of UDRL-optimizes a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments, similar to certain well-known traditional RL algorithms. Here we show that for a specific episodic UDRL algorithm (eUDRL, including GCSL), this is not the case, and give the causes of this limitation. To do so, we first introduce a helpful rewrite of eUDRL as a recursive policy update. This formulation helps to disprove its convergence to the optimal policy for a wide class of stochastic environments. Finally, we provide a concrete example of a very simple environment where eUDRL diverges. Since the primary aim of this paper is to present a negative result, and the best counterexamples are the simplest ones, we restrict all discussions to finite (discrete) environments, ignoring issues of function approximation and limited sample size.

show abstract

Learning to select mates in artificial life

Ashley

Chockalingam

Kuzma

et al. 2019

View full text Add to dashboard Cite

On Narrative Information and the Distillation of Stories

Ashley¹,

Herrmann²,

Friggstad³

et al. 2022

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dylan R. Ashley

The Alberta Workloads for the SPEC CPU 2017 Benchmark Suite

Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Learning to select mates in artificial life

On Narrative Information and the Distillation of Stories

Contact Info

Product

Resources

About