Dharshan Kumaran scite author profile

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

show abstract

Overcoming catastrophic forgetting in neural networks

Kirkpatrick

Pascanu

Rabinowitz

et al. 2017

Proc. Natl. Acad. Sci. U.S.A.

4,117

3,870

View full text Add to dashboard Cite

The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Until now neural networks have not been capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks that they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on a hand-written digit dataset and by learning several Atari 2600 games sequentially.synaptic consolidation | artificial intelligence | stability plasticity | continual learning | deep learning A chieving artificial general intelligence requires that agents are able to learn and remember many different tasks (1). This is particularly difficult in real-world settings: The sequence of tasks may not be explicitly labeled, tasks may switch unpredictably, and any individual task may not recur for long time intervals. Critically, therefore, intelligent agents must demonstrate a capacity for continual learning: that is, the ability to learn consecutive tasks without forgetting how to perform previously trained tasks.Continual learning poses particular challenges for artificial neural networks due to the tendency for knowledge of the previously learned task(s) (e.g., task A) to be abruptly lost as information relevant to the current task (e.g., task B) is incorporated. This phenomenon, termed catastrophic forgetting (2-6), occurs specifically when the network is trained sequentially on multiple tasks because the weights in the network that are important for task A are changed to meet the objectives of task B. Whereas recent advances in machine learning and in particular deep neural networks have resulted in impressive gains in performance across a variety of domains (e.g., refs. 7 and 8), little progress has been made in achieving continual learning. Current approaches have typically ensured that data from all tasks are simultaneously available during training. By interleaving data from multiple tasks during learning, forgetting does not occur because the weights of the network can be jointly optimized for performance on all tasks. In this regime-often referred to as the multitask learning paradigm-deep-learning techniques have been used to train single agents that can successfully play multiple Atari games (9, 10). If tasks are presented sequentially, multitask learning can be used only if the data are recorded by an episodic memory system and replayed to the network during training. This approach [often called system-level consolidation (4, 5)] is impractical for learning large numbers of tasks, as in our setting it would require the amount of memories being stored and replayed to be proportional to the number of tasks. The lack of algorithms to support continual learning thus rema...

show abstract

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Silver

Hubert

Schrittwieser

et al. 2018

Science

2,416

1,596

View full text Add to dashboard Cite

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

show abstract

Patients with hippocampal amnesia cannot imagine new experiences

Hassabis

Kumaran

Vann

et al. 2007

Proc. Natl. Acad. Sci. U.S.A.

1,189

1,229

View full text Add to dashboard Cite

Amnesic patients have a well established deficit in remembering their past experiences. Surprisingly, however, the question as to whether such patients can imagine new experiences has not been formally addressed to our knowledge. We tested whether a group of amnesic patients with primary damage to the hippocampus bilaterally could construct new imagined experiences in response to short verbal cues that outlined a range of simple commonplace scenarios. Our results revealed that patients were markedly impaired relative to matched control subjects at imagining new experiences. Moreover, we identified a possible source for this deficit. The patients' imagined experiences lacked spatial coherence, consisting instead of fragmented images in the absence of a holistic representation of the environmental setting. The hippocampus, therefore, may make a critical contribution to the creation of new experiences by providing the spatial context into which the disparate elements of an experience can be bound. Given how closely imagined experiences match episodic memories, the absence of this function mediated by the hippocampus, may also fundamentally affect the ability to vividly re-experience the past.episodic ͉ hippocampus ͉ imagination ͉ memory ͉ construction E ach of us has our own unique personal past, comprising a myriad of autobiographical experiences accrued over a lifetime. Recollection of these rich autobiographical or episodic memories has been likened to mentally traveling back in time and re-experiencing one's past (1). It has long been known that the hippocampus and related medial temporal lobe structures play a critical role in supporting episodic memory (2), and damage to even the hippocampus alone is sufficient to cause amnesia (3, 4). How exactly the hippocampus supports episodic memory (5-7), or indeed whether its involvement is time-limited (5, 8) or permanent (7, 9) is uncertain, however. Numerous studies have attempted to settle this debate by ascertaining the status of remote episodic memory in patients with hippocampal amnesia (10) but without resolution thus far. This is not altogether surprising as studying memory for personal experiences is fraught with methodological issues (11-13), not least of which is how to generalize across individuals when autobiographical memories are unique to each person (9, 14).We therefore sought to further our understanding of the role of the hippocampus in episodic memory by adopting a different approach. If patients with hippocampal damage are impaired at recollecting past events, we wondered, can they imagine new experiences? While there have been some suggestions that amnesic patients have difficulty envisioning themselves in the future (15-18), surprisingly, the more general question of whether imagining new experiences depends on a functioning hippocampus has not been formally addressed to our knowledge. In fact, episodic memory and imagining or constructing events share striking similarities in terms of the psychological processes engaged (19-21). These include i...

show abstract

Frames, Biases, and Rational Decision-Making in the Human Brain

Martino

Kumaran

Seymour

et al. 2006

Science

1,278

1,028

View full text Add to dashboard Cite

Human choices are remarkably susceptible to the manner in which options are presented. This socalled "framing effect" represents a striking violation of standard economic accounts of human rationality, although its underlying neurobiology is not understood. We found that the framing effect was specifically associated with amygdala activity, suggesting a key role for an emotional system in mediating decision biases. Moreover, across individuals, orbital and medial prefrontal cortex activity predicted a reduced susceptibility to the framing effect. This finding highlights the importance of incorporating emotional processes within models of human choice and suggests how the brain may modulate the effect of these biasing influences to approximate rationality.A central tenet of rational decision-making is logical consistency across decisions, regardless of the manner in which available choices are presented. This assumption, known as "extensionality" (1) or "invariance" (2), is a fundamental axiom of game theory (3). However, the proposition that human decisions are "description-invariant" is challenged by a wealth of empirical data (4, 5). Kahneman and Tversky originally described this deviation from rational decision-making, which they termed the "framing effect," as a key aspect of prospect theory (6, 7).Theories of decision-making have tended to emphasize the operation of analytic processes in guiding choice behavior. However, more intuitive or emotional responses can play a key role in human decision-making (8-10). Thus, when taking decisions under conditions when available information is incomplete or overly complex, subjects rely on a number of simplifying heuristics, or efficient rules of thumb, rather than extensive algorithmic processing (11). One suggestion is that the framing effect results from systematic biases in choice behavior arising from an affect heuristic underwritten by an emotional system (12, 13). However, despite the substantial role of the framing effect in influencing human decision-making, the underlying neurobiological basis is not understood.We investigated the neurobiological basis of the framing effect by means of functional magnetic resonance imaging (fMRI) and a novel financial decision-making task. Participants (20 university students or graduates) received a message indicating the amount of money that they would initially receive in that trial (e.g., "You receive £50"). Subjects then had to choose between a "sure" option and a "gamble" option presented in the context of two different frames. The "sure" option was formulated as either the amount of money retained from the initial starting amount (e.g., keep £20 of the £50; "Gain" frame) or as the amount of money lost from the initial amount (e.g., lose £30 of the £50; "Loss" frame). The "gamble" option was identical in both frames and was represented as a pie chart depicting the probability of winning or losing ( Fig. 1) (14).The behavioral results indicated that subjects' decisions were significantly affected by our framing...

show abstract

Neuroscience-Inspired Artificial Intelligence

Hassabis

Kumaran

Summerfield

et al. 2017

Neuron

1,123

825

View full text Add to dashboard Cite

The fields of neuroscience and artificial intelligence (AI) have a long and intertwined history. In more recent times, however, communication and collaboration between the two fields has become less commonplace. In this article, we argue that better understanding biological brains could play a vital role in building intelligent machines. We survey historical interactions between the AI and neuroscience fields and emphasize current advances in AI that have been inspired by the study of neural computation in humans and other animals. We conclude by highlighting shared themes that may be key for advancing future research in both fields.

show abstract

Using Imagination to Understand the Neural Basis of Episodic Memory

Hassabis¹,

Kumaran²,

Maguire³

2007

J. Neurosci.

639

588

View full text Add to dashboard Cite

Functional MRI (fMRI) studies investigating the neural basis of episodic memory recall, and the related task of thinking about plausible personal future events, have revealed a consistent network of associated brain regions. Surprisingly little, however, is understood about the contributions individual brain areas make to the overall recollective experience. To examine this, we used a novel fMRI paradigm in which subjects had to imagine fictitious experiences. In contrast to future thinking, this results in experiences that are not explicitly temporal in nature or as reliant on self-processing. By using previously imagined fictitious experiences as a comparison for episodic memories, we identified the neural basis of a key process engaged in common, namely scene construction, involving the generation, maintenance and visualization of complex spatial contexts. This was associated with activations in a distributed network, including hippocampus, parahippocampal gyrus, and retrosplenial cortex. Importantly, we disambiguated these common effects from episodic memory-specific responses in anterior medial prefrontal cortex, posterior cingulate cortex and precuneus. These latter regions may support self-schema and familiarity processes, and contribute to the brain's ability to distinguish real from imaginary memories. We conclude that scene construction constitutes a common process underlying episodic memory and imagination of fictitious experiences, and suggest it may partially account for the similar brain networks implicated in navigation, episodic future thinking, and the default mode. We suggest that additional brain regions are co-opted into this core network in a task-specific manner to support functions such as episodic memory that may have additional requirements.

show abstract

Vector-based navigation using grid-like representations in artificial agents

Banino

Barry

Uría

et al. 2018

Nature

515

579

View full text Add to dashboard Cite

Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex . Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space and is critical for integrating self-motion (path integration) and planning direct trajectories to goals (vector-based navigation). Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types . We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments-optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.