Towards Continual Reinforcement Learning: A Review and Perspectives

Khetarpal, Khimya; Riemer, Matthew; Rish, Irina; Precup, Doina

doi:10.48550/arxiv.2012.13490

Cited by 27 publications

(42 citation statements)

References 205 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The setting we study in our work shares conceptual similarities with prior work in continual and lifelong learning (Schmidhuber, 1987;Thrun & Mitchell, 1995;Parisi et al, 2019;Hadsell et al, 2020). In context of reinforcement learning, this work has studied the problem of episodic learning in sequential MDPs (Khetarpal et al, 2020; Second, the continuing setting (bottom row, (2)), where a floor cleaning robot is tasked with keeping a floor clean and is only evaluated on its cumulative performance (Eq. 2) over the agent's lifetime.…”

Section: Related Workmentioning

confidence: 98%

Autonomous Reinforcement Learning: Formalism and Benchmarking

Sharma¹,

Xu²,

Sardana³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) provides a naturalistic framing for learning through trial and error, which is appealing both because of its simplicity and effectiveness and because of its resemblance to how humans and animals acquire skills through experience. However, real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world, whereas common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms, such as robots. In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials. We introduce a simulated benchmark EARL 1 around this framework, containing a set of diverse and challenging simulated tasks reflective of the hurdles introduced to learning when only a minimal reliance on extrinsic intervention can be assumed. We show that standard approaches to episodic RL and existing approaches struggle as interventions are minimized, underscoring the need for developing new algorithms for reinforcement learning with a greater focus on autonomy.

show abstract

Section: Related Workmentioning

confidence: 98%

Autonomous Reinforcement Learning: Formalism and Benchmarking

Sharma¹,

Xu²,

Sardana³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The type of knowledge that is transferred are policies learned in source tasks which are re-evaluated in the target task and recombined using the GPI procedure. A natural use-case for ξ-learning are continual problems (Khetarpal et al, 2020) where an agent has continually adapt to changing tasks, which are in our setting different reward functions.…”

Section: Related Workmentioning

confidence: 99%

Successor Feature Representations

Reinke¹,

Alameda-Pineda²

2021

Preprint

View full text Add to dashboard Cite

Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks. Successor features (SF) are a prominent transfer mechanism in domains where the reward function changes between tasks. They reevaluate the expected return of previously learned policies in a new target task and to transfer their knowledge. A limiting factor of the SF framework is its assumption that rewards linearly decompose into successor features and a reward weight vector. We propose a novel SF mechanism, ξlearning, based on learning the cumulative discounted probability of successor features. Crucially, ξ-learning allows to reevaluate the expected return of policies for general reward functions. We introduce two ξ-learning variations, prove its convergence, and provide a guarantee on its transfer performance. Experimental evaluations based on ξ-learning with function approximation demonstrate the prominent advantage of ξ-learning over available mechanisms not only for general reward functions, but also in the case of linearly decomposable reward functions.

show abstract

“…time. Non-stationarity can arise from diverse causes and can be interpreted as a form of partial knowledge on environment (Khetarpal et al 2020). Learning in nonstationary environments has been diffusely addressed in the literature (Garcia and Smith 2000;Ghate and Smith 2013;Lesner and Scherrer 2015).…”

Section: Introductionmentioning

confidence: 99%

“…In this sense, Lifelong Learning (LL) can be considered closer to the intuitive idea of learning for human agents. More technically, LL requires the agent to readily adapt its behavior to the environment evolution, as well as keeping memory of past behaviors in order to leverage this knowledge on future similar phases (Khetarpal et al 2020). This represents, indeed, a critical trade-off, peculiar of the lifelong setting.…”

Section: Introductionmentioning

confidence: 99%

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Liotet¹,

Vidaich²,

Metelli³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for current reinforcement learning algorithms. Yet this would be a much needed feature for practical applications. In this paper, we propose an approach which learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time. This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias. We combine the future performance estimate with the past performance to mitigate catastrophic forgetting. To avoid overfitting the collected data, we derive a differentiable variance bound that we embed as a penalization term. Finally, we empirically validate our approach, in comparison with state-of-the-art algorithms, on realistic environments, including water resource management and trading.

show abstract

Towards Continual Reinforcement Learning: A Review and Perspectives

Cited by 27 publications

References 205 publications

Autonomous Reinforcement Learning: Formalism and Benchmarking

Autonomous Reinforcement Learning: Formalism and Benchmarking

Successor Feature Representations

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

Contact Info

Product

Resources

About