Towards Continual Reinforcement Learning: A Review and Perspectives

Khetarpal, Khimya; Riemer, Matthew; Rish, Irina; Precup, Doina

doi:10.1613/jair.1.13673

Cited by 73 publications

(33 citation statements)

References 255 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An important benefit of using options for exploration is that, by encoding temporally extended behaviours into a set of options, the agent can later leverage a collection of diverse and purposeful behaviours in other tasks. This is particularly important in the face of non-stationary, or continual learning (Khetarpal et al, 2020), and is in direct contrast to several other exploration techniques. Methods such as count-based or error prediction-based methods are more tied to the agent's state visitation distribution and are not that flexible in the face of non-stationarity.…”

Section: Exploration In the Face Of Non-stationaritymentioning

confidence: 99%

Deep Laplacian-based Options for Temporally-Extended Exploration

Klissarov¹,

Machado²

2023

Preprint

View full text Add to dashboard Cite

Selecting exploratory actions that generate a rich stream of experience for better learning is a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem consists in selecting actions according to specific policies for an extended period of time, also known as options. A recent line of work to derive such exploratory options builds upon the eigenfunctions of the graph Laplacian. Importantly, until now these methods have been mostly limited to tabular domains where (1) the graph Laplacian matrix was either given or could be fully estimated, (2) performing eigendecomposition on this matrix was computationally tractable, and (3) value functions could be learned exactly. Additionally, these methods required a separate option discovery phase. These assumptions are fundamentally not scalable. In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration. To do so, we introduce a fully online deep RL algorithm for discovering Laplacianbased options and evaluate our approach on a variety of pixel-based tasks. We compare to several state-of-the-art exploration methods and show that our approach is effective, general, and especially promising in non-stationary settings.

show abstract

Section: Exploration In the Face Of Non-stationaritymentioning

confidence: 99%

Deep Laplacian-based Options for Temporally-Extended Exploration

Klissarov¹,

Machado²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Deep reinforcement learning approaches have enabled human-like performance on many game tasks and new control policies for complex, high-dimensional spaces. 2 Recently, scaling transformer models has resulted in the creation of large language models and powerful, multi-task foundation models. 3 Figure 1.…”

Section: Introductionmentioning

confidence: 99%

Exploiting large neuroimaging datasets to create connectome-constrained approaches for more robust, efficient, and adaptable artificial intelligence

Johnson

Robinson

Vallabha

et al. 2023

Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications V

View full text Add to dashboard Cite

Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We have pursued multiple neuroscience-inspired AI efforts which may overcome these data inefficiencies, power inefficiencies, and lack of generalization. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursued different approaches within this pipeline structure, including data-driven discovery in biological networks, augmenting existing computational neuroscience models, investigating the biological structure which enables behaviors, and modifying existing machine learning architectures with insight from biological structure. First, as a demonstration of data-driven discovery, the team has developed a technique for discovery of repeated subcircuits, or motifs. These were incorporated into a neural architecture search approach to evolve network architectures. Second, we have conducted analysis of the heading direction circuit in the fruit fly, which performs fusion of visual and angular velocity features, to explore augmenting existing computational models with new insight. Our team discovered a novel pattern of connectivity, implemented a new model, and demonstrated sensor fusion on a robotic platform. Third, the team analyzed circuitry for memory formation in the fruit fly connectome, enabling the design of a novel generative replay approach. This replay approach resulted in an over 20% accuracy improvement in an incremental class learning scenario, and also demonstrated the ability to utilize large neuroscience datasets to analyze the neural connectivity underlying behavior. Finally, the team has begun analysis of connectivity in mammalian cortex to explore potential improvements to transformer networks. These constraints increased network robustness on the most challenging examples in the CIFAR-10-C computer vision robustness benchmark task, while reducing learnable attention parameters by over an order of magnitude. Taken together, these results demonstrate multiple potential approaches to utilize insight from neural systems for developing robust and efficient machine learning techniques.

show abstract

“…The inspiration for many replay-based methods comes from Complementary Learning Systems (CLS) ; Khetarpal et al (2022), which describes learning in mammalian brains. The hippocampus memorises recent observations and replays them to the neocortex, which is a slow statistical learner.…”

Section: Introductionmentioning

confidence: 99%

Transforming Pharma with Data Science, AI and Machine Learning

Yang¹

2022

Data Science, AI, and Machine Learning in Drug Development

View full text Add to dashboard Cite

Continual RL is a challenging problem where the agent is exposed to a sequence of tasks; it should learn new tasks without forgetting old ones, and learning the new task should improve performance on previous and future tasks. The most common approaches use model-free RL algorithms as a base, and replay buffers have been used to overcome catastrophic forgetting. However, the buffers are often very large making scalability difficult. Also, the concept of replay comes from biological inspiration, where evidence suggests that replay is applied to a world model, which implies model-based RL -and model-based RL should have benefits for continual RL, where it is possible to exploit knowledge independent of the policy. We present WMAR, World Models with Augmented Replay, a model-based RL algorithm with a world model and memory efficient distribution matching replay buffer. It is based on the well-known DreamerV3 algorithm, which has a simple FIFO buffer and was not tested in a continual RL setting. We evaluated WMAR vs WMAR (FIFO only) on tasks with and without shared structure from OpenAI ProcGen and Atari respectively, and without a task oracle. We found that WMAR has favourable properties on continual RL with significantly reduced computational overhead compared to WMAR (FIFO only). WMAR had small benefits over DreamerV3 on tasks with shared structure and substantially better forgetting characteristics on tasks without shared structure; but at the cost of lower plasticity seen in a lower maximum on new tasks. The results suggest that model-based RL using a world model with a memory efficient replay buffer can be an effective and practical approach to continual RL, justifying future work.

show abstract

Towards Continual Reinforcement Learning: A Review and Perspectives

Cited by 73 publications

References 255 publications

Deep Laplacian-based Options for Temporally-Extended Exploration

Deep Laplacian-based Options for Temporally-Extended Exploration

Exploiting large neuroimaging datasets to create connectome-constrained approaches for more robust, efficient, and adaptable artificial intelligence

Transforming Pharma with Data Science, AI and Machine Learning

Contact Info

Product

Resources

About